The spread of infectious diseases is a highly complex spatiotemporal process, difficult to understand, predict, and effectively respond to. Machine learning and artificial intelligence (AI) have achieved impressive results in other learning and prediction tasks; however, while many AI solutions are developed for disease prediction, only a few of them are adopted by decision-makers to support policy interventions. Among several issues preventing their uptake, AI methods are known to amplify the bias in the data they are trained on. This is especially problematic for infectious disease models that typically leverage large, open, and inherently biased spatiotemporal data. These biases may propagate through the modeling pipeline to decision-making, resulting in inequitable policy interventions. Therefore, there is a need to gain an understanding of how the AI disease modeling pipeline can mitigate biased input data, in-processing models, and biased outputs. Specifically, our vision is to develop a large-scale micro-simulation of individuals from which human mobility, population, and disease ground-truth data can be obtained. From this complete dataset—which may not reflect the real world—we can sample and inject different types of bias. By using the sampled data in which bias is known (as it is given as the simulation parameter), we can explore how existing solutions for fairness in AI can mitigate and correct these biases and investigate novel AI fairness solutions. Achieving this vision would result in improved trust in such models for informing fair and equitable policy interventions. 
                        more » 
                        « less   
                    
                            
                            A Framework for Measuring and Benchmarking Fairness of Generative Crowd-Flow Models
                        
                    
    
            Urban population growth has significantly complicated the management of mobility systems, demanding innovative tools for planning. Generative Crowd-Flow  (GCF) models, which leverage machine learning to simulate urban movement patterns, offer a promising solution but lack sufficient evaluation of their fairness–a critical factor for equitable urban planning. We present an approach to measure and benchmark the fairness of GCF  models by developing a first-of-its-kind set of fairness metrics specifically tailored for this purpose. Using observed flow data, we employ a stochastic biased sampling approach to generate multiple permutations of Origin-Destination  datasets, each demonstrating intentional bias. Our proposed framework allows for the comparison of multiple GCF  models to evaluate how models introduce bias in outputs. Preliminary results indicate a tradeoff between model accuracy and fairness, underscoring the need for careful consideration in the deployment of these technologies. To this end, this study bridges the gap between human mobility literature and fairness in machine learning, with potential to help urban planners and policymakers leverage GCF  models for more equitable urban infrastructure development. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 2304213
- PAR ID:
- 10606402
- Publisher / Repository:
- Association for Computing Machinery (ACM)
- Date Published:
- Journal Name:
- ACM Journal on Computing and Sustainable Societies
- Volume:
- 3
- Issue:
- 2
- ISSN:
- 2834-5533
- Format(s):
- Medium: X Size: p. 1-27
- Size(s):
- p. 1-27
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Machine learning (ML) is playing an increasing role in decision-making tasks that directly affect individuals, e.g., loan approvals, or job applicant screening. Significant concerns arise that, without special provisions, individuals from under-privileged backgrounds may not get equitable access to services and opportunities. Existing research studies {\em fairness} with respect to protected attributes such as gender, race or income, but the impact of location data on fairness has been largely overlooked. With the widespread adoption of mobile apps, geospatial attributes are increasingly used in ML, and their potential to introduce unfair bias is significant, given their high correlation with protected attributes. We propose techniques to mitigate location bias in machine learning. Specifically, we consider the issue of miscalibration when dealing with geospatial attributes. We focus on {\em spatial group fairness} and we propose a spatial indexing algorithm that accounts for fairness. Our KD-tree inspired approach significantly improves fairness while maintaining high learning accuracy, as shown by extensive experimental results on real data.more » « less
- 
            Abstract Applying machine learning to clinical outcome prediction is challenging due to imbalanced datasets and sensitive tasks that contain rare yet critical outcomes and where equitable treatment across diverse patient groups is essential. Despite attempts, biases in predictions persist, driven by disparities in representation and exacerbated by the scarcity of positive labels, perpetuating health inequities. This paper introduces , a synthetic data generation approach leveraging large language models, to address these issues. enhances algorithmic performance and reduces bias by creating realistic, anonymous synthetic patient data that improves representation and augments dataset patterns while preserving privacy. Through experiments on multiple datasets, we demonstrate that boosts mortality prediction performance across diverse subgroups, achieving up to a 21% improvement in F1 Score without requiring additional data or altering downstream training pipelines. Furthermore, consistently reduces subgroup performance gaps, as shown by universal improvements in performance and fairness metrics across four experimental setups.more » « less
- 
            Data-driven algorithms are only as good as the data they work with, while datasets, especially social data, often fail to represent minorities adequately. Representation Bias in data can happen due to various reasons, ranging from historical discrimination to selection and sampling biases in the data acquisition and preparation methods. Given that “bias in, bias out,” one cannot expect AI-based solutions to have equitable outcomes for societal applications, without addressing issues such as representation bias. While there has been extensive study of fairness in machine learning models, including several review papers, bias in the data has been less studied. This article reviews the literature on identifying and resolving representation bias as a feature of a dataset, independent of how consumed later. The scope of this survey is bounded to structured (tabular) and unstructured (e.g., image, text, graph) data. It presents taxonomies to categorize the studied techniques based on multiple design dimensions and provides a side-by-side comparison of their properties. There is still a long way to fully address representation bias issues in data. The authors hope that this survey motivates researchers to approach these challenges in the future by observing existing work within their respective domains.more » « less
- 
            Energy justice advocates for the equitable and accessible provision of energy services, mainly focusing on marginalized communities. Adopting machine learning in analyzing energy-related data can unintentionally reinforce social inequalities. This perspective highlights the stages in the machine learning process where biases may emerge, from data collection and model development to deployment. Each phase presents distinct challenges and consequences, ultimately influencing the fairness and accuracy of machine learning models. The ramifications of machine learning bias within the energy sector are profound, encompassing issues such as inequalities, the perpetuation of negative feedback loops, privacy concerns regarding, and economic impacts arising from energy burden and energy poverty. Recognizing and rectifying these biases is imperative for leveraging technology to advance society rather than perpetuating existing injustices. Addressing biases at the intersection of energy justice and machine learning requires a comprehensive approach, acknowledging the interconnectedness of social, economic, and technological factors.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
