Abstract Deep generative models have shown significant promise in improving performance in design space exploration. But there is limited understanding of their interpretability, a necessity when model explanations are desired and problems are ill-defined. Interpretability involves learning design features behind design performance, called designer learning. This study explores human–machine collaboration’s effects on designer learning and design performance. We conduct an experiment (N = 42) designing mechanical metamaterials using a conditional variational autoencoder. The independent variables are: (i) the level of automation of design synthesis, e.g., manual (where the user manually manipulates design variables), manual feature-based (where the user manipulates the weights of the features learned by the encoder), and semi-automated feature-based (where the agent generates a local design based on a start design and user-selected step size); and (ii) feature semanticity, e.g., meaningful versus abstract features. We assess feature-specific learning using item response theory and design performance using utopia distance and hypervolume improvement. The results suggest that design performance depends on the subjects’ feature-specific knowledge, emphasizing the precursory role of learning. The semi-automated synthesis locally improves the utopia distance. Still, it does not result in higher global hypervolume improvement compared to manual design synthesis and reduced designer learning compared to manual feature-based synthesis. The subjects learn semantic features better than abstract features only when design performance is sensitive to them. Potential cognitive constructs influencing learning in human–machine collaborative settings are discussed, such as cognitive load and recognition heuristics.
more »
« less
Linear Classifiers that Encourage Constructive Adaptation
Machine learning systems are often used in settings where individuals adapt their features to obtain a desired outcome. In such settings, strategic behavior leads to a sharp loss in model performance in deployment. In this work, we aim to address this problem by learning classifiers that encourage decision subjects to change their features in a way that leads to improvement in both predicted \emph{and} true outcome. We frame the dynamics of prediction and adaptation as a two-stage game, and characterize optimal strategies for the model designer and its decision subjects. In benchmarks on simulated and real-world datasets, we find that classifiers trained using our method maintain the accuracy of existing approaches while inducing higher levels of improvement and less manipulation.
more »
« less
- Award ID(s):
- 2023495
- PAR ID:
- 10282780
- Date Published:
- Journal Name:
- Algorithmic Recourse workshop at ICML'21
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
With recent advances in Deep Learning (DL) models, the healthcare domain has seen an increased adoption of neural networks for clinical diagnosis, monitoring, and prediction. Deep Learning models have been developed for various tasks using 1D (one-dimensional) time-series signals. Time-series healthcare data, typically collected through sensors, have specific structures and characteristics such as frequency and amplitude. The nature of these features, including varying sampling rates that depend on the instruments used for sensing, poses challenges in handling them. Electrocardiograms (ECG), a class of 1D time-series signals representing the electrical activity of the heart, have been used to develop heart condition classification decision support systems. The sampling rate of these signals, influenced by different ECG instruments as well as their calibrations, can greatly impact the learning functions of deep learning models and subsequently, their decision outcomes. This hinders the development and deployment of generalized, DL-based ECG classifiers that can work with data from a variety of ECG instruments, particularly when the sampling rate of the training data remains unknown to users. Moreover, DL models are not designed to recognize the sampling rate of the testing data on which they are being deployed, further complicating their effective application across diverse clinical settings. In this study, we investigated the effect of different sampling rates of time-series ECG signals on DL-based ECG classifiers. To the best of our knowledge, this is the first work to understand how varying sampling rates affect the performance of DL-based models for classifying 1D time-series ECG signals. Through our comprehensive experiments, we showed that accuracy can drop by as much as 20% when the training and testing sampling rates are different. We provide visual explanations to understand the differences in learned model features through activation maps when the sampling rates for training and testing data are different. We also investigated potential strategies to address the challenges posed by different sampling rates: (i) transfer learning, (ii) resampling, and (iii) training a DL model using ECG data at different sampling rates.more » « less
-
Automated decision-making systems are increasingly deployed in domains such as hiring and credit approval where negative outcomes can have substantial ramifications for decision subjects. Thus, recent research has focused on providing explanations that help decision subjects understand the decision system and enable them to take actionable recourse to change their outcome. Popular counterfactual explanation techniques aim to achieve this by describing alterations to an instance that would transform a negative outcome to a positive one. Unfortunately, little user evaluation has been performed to assess which of the many counterfactual approaches best achieve this goal. In this work, we conduct a crowd-sourced between-subjects user study (N = 252) to examine the effects of counterfactual explanation type and presentation on lay decision subjects’ understandings of automated decision systems. We find that the region-based counterfactual type significantly increases objective understanding, subjective understanding, and response confidence as compared to the point-based type. We also find that counterfactual presentation significantly effects response time and moderates the effect of counterfactual type for response confidence, but not understanding. A qualitative analysis reveals how decision subjects interact with different explanation configurations and highlights unmet needs for explanation justification. Our results provide valuable insights and recommendations for the development of counterfactual explanation techniques towards achieving practical actionable recourse and empowering lay users to seek justice and opportunity in automated decision workflows.more » « less
-
Automated decision-making systems are increasingly deployed in domains such as hiring and credit approval where negative outcomes can have substantial ramifications for decision subjects. Thus, recent research has focused on providing explanations that help decision subjects understand the decision system and enable them to take actionable recourse to change their outcome. Popular counterfactual explanation techniques aim to achieve this by describing alterations to an instance that would transform a negative outcome to a positive one. Unfortunately, little user evaluation has been performed to assess which of the many counterfactual approaches best achieve this goal. In this work, we conduct a crowd-sourced between-subjects user study (N = 252) to examine the effects of counterfactual explanation type and presentation on lay decision subjects’ understandings of automated decision systems. We find that the region-based counterfactual type significantly increases objective understanding, subjective understanding, and response confidence as compared to the point-based type. We also find that counterfactual presentation significantly effects response time and moderates the effect of counterfactual type for response confidence, but not understanding. A qualitative analysis reveals how decision subjects interact with different explanation configurations and highlights unmet needs for explanation justification. Our results provide valuable insights and recommendations for the development of counterfactual explanation techniques towards achieving practical actionable recourse and empowering lay users to seek justice and opportunity in automated decision workflows.more » « less
-
Student engagement is a key component of learning and teaching, resulting in a plethora of automated methods to measure it. Whereas most of the literature explores student engagement analysis using computer-based learning often in the lab, we focus on using classroom instruction in authentic learning environments. We collected audiovisual recordings of secondary school classes over a one and a half month period, acquired continuous engagement labeling per student (N=15) in repeated sessions, and explored computer vision methods to classify engagement from facial videos. We learned deep embeddings for attentional and affective features by training Attention-Net for head pose estimation and Affect-Net for facial expression recognition using previously-collected large-scale datasets. We used these representations to train engagement classifiers on our data, in individual and multiple channel settings, considering temporal dependencies. The best performing engagement classifiers achieved student-independent AUCs of .620 and .720 for grades 8 and 12, respectively, with attention-based features outperforming affective features. Score-level fusion either improved the engagement classifiers or was on par with the best performing modality. We also investigated the effect of personalization and found that only 60 seconds of person-specific data, selected by margin uncertainty of the base classifier, yielded an average AUC improvement of .084.more » « less
An official website of the United States government

