- NSF-PAR ID:
- 10422632
- Editor(s):
- Obradovic, Zoran
- Date Published:
- Journal Name:
- Big Data
- ISSN:
- 2167-6461
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Recently, there has been a growing interest in developing machine learning (ML) models that can promote fairness, i.e., eliminating biased predictions towards certain populations (e.g., individuals from a specific demographic group). Most existing works learn such models based on well-designed fairness constraints in optimization. Nevertheless, in many practical ML tasks, only very few labeled data samples can be collected, which can lead to inferior fairness performance. This is because existing fairness constraints are designed to restrict the prediction disparity among different sensitive groups, but with few samples, it becomes difficult to accurately measure the disparity, thus rendering ineffective fairness optimization. In this paper, we define the fairness-aware learning task with limited training samples as the fair few-shot learning problem. To deal with this problem, we devise a novel framework that accumulates fairness-aware knowledge across different meta-training tasks and then generalizes the learned knowledge to meta-test tasks. To compensate for insufficient training samples, we propose an essential strategy to select and leverage an auxiliary set for each meta-test task. These auxiliary sets contain several labeled training samples that can enhance the model performance regarding fairness in meta-test tasks, thereby allowing for the transfer of learned useful fairness-oriented knowledge to meta-test tasks. Furthermore, we conduct extensive experiments on three real-world datasets to validate the superiority of our framework against the state-of-the-art baselines.more » « less
-
Abstract Although ecosystems respond to global change at regional to continental scales (i.e., macroscales), model predictions of ecosystem responses often rely on data from targeted monitoring of a small proportion of sampled ecosystems within a particular geographic area. In this study, we examined how the sampling strategy used to collect data for such models influences predictive performance. We subsampled a large and spatially extensive data set to investigate how macroscale sampling strategy affects prediction of ecosystem characteristics in 6,784 lakes across a 1.8‐million‐km2area. We estimated model predictive performance for different subsets of the data set to mimic three common sampling strategies for collecting observations of ecosystem characteristics: random sampling design, stratified random sampling design, and targeted sampling. We found that sampling strategy influenced model predictive performance such that (1) stratified random sampling designs did not improve predictive performance compared to simple random sampling designs and (2) although one of the scenarios that mimicked targeted (non‐random) sampling had the poorest performing predictive models, the other targeted sampling scenarios resulted in models with similar predictive performance to that of the random sampling scenarios. Our results suggest that although potential biases in data sets from some forms of targeted sampling may limit predictive performance, compiling existing spatially extensive data sets can result in models with good predictive performance that may inform a wide range of science questions and policy goals related to global change.
-
This paper studies a category of visual question answering tasks, in which accessing external knowledge is necessary for answering the questions. This category is called outside-knowledge visual question answering (OK-VQA). A major step in developing OKVQA systems is to retrieve relevant documents for the given multimodal query. Current state-of-the-art dense retrieval model for this task uses an asymmetric architecture with a multi-modal query encoder and a uni-modal document encoder. Such an architecture requires a large amount of training data for effective performance. We propose an automatic data generation pipeline for pre-training passage retrieval models for OK-VQA tasks. The proposed approach leads to 26.9% Precision@5 improvements compared to the current state-of-the-art. Additionally, the proposed pre-training approach exhibits a good ability in zero-shot retrieval scenarios.more » « less
-
In the urban corridor with a mixed traffic composition of connected and automated vehicles (CAVs) alongside human-driven vehicles (HDVs), vehicle operations are intricately influenced by both individual driving behaviors and the presence of signalized intersections. Therefore, the development of a coordinated control strategy that effectively accommodates these dual factors becomes imperative to enhance the overall quality of traffic flow. This study proposes a bi-level structure crafted to decouple the joint effects of the vehicular driving behaviors and corridor signal offsets setting. The objective of this structure is to optimize both the average travel time (ATT) and fuel consumption (AFC). At the lower-level, three types of car-following models while considering driving modes are presented to illustrate the desired driving behaviors of HDVs and CAVs. Moreover, a trigonometry function method combined with a rolling horizon scheme is proposed to generate the eco-trajectory of CAVs in the mixed traffic flow. At the upper-level, a multi-objective optimization model for corridor signal offsets is formulated to minimize ATT and AFC based on the lower-level simulation outputs. Additionally, a revised Non-Dominated Sorting Genetic Algorithm II (NSGA-II) is adopted to identify the set of Pareto-optimal solutions for corridor signal offsets under different CAV penetration rates (CAV PRs). Numerical experiments are conducted within a corridor that encompasses three signalized intersections. The performance of our proposed eco-driving strategy is validated in comparison to the intelligent driver model (IDM) and green light optimal speed advisory (GLOSA) algorithm in single-vehicle simulation. Results show that our proposed strategy yields reduced travel time and fuel consumption to both IDM and GLOSA. Subsequently, the effectiveness of our proposed coordinated control strategy is validated across various CAV PRs. Results indicated that the optimal AFC can be reduced by 4.1%–32.2% with CAV PRs varying from 0.2 to 1, and the optimal ATT can be saved by 2.3% maximum. Furthermore, sensitivity analysis is conducted to evaluate the impact of CAV PRs and V/C ratios on the optimal ATT and AFC.more » « less
-
A training process for facial expression recognition is usually performed sequentially in three individual stages: feature learning, feature selection, and classifier construction. Extensive empirical studies are needed to search for an optimal combination of feature representation, feature set, and classifier to achieve good recognition performance. This paper presents a novel Boosted Deep Belief Network (BDBN) for performing the three training stages iteratively in a unified loopy framework. Through the proposed BDBN framework, a set of features, which is effective to characterize expression-related facial appearance/shape changes, can be learned and selected to form a boosted strong classifier in a statistical way. As learning continues, the strong classifier is improved iteratively and more importantly, the discriminative capabilities of selected features are strengthened as well according to their relative importance to the strong classifier via a joint fine-tune process in the BDBN framework. Extensive experiments on two public databases showed that the BDBN framework yielded dramatic improvements in facial expression analysis.more » « less