NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Disparate Effect Of Missing Mediators On Transportability of Causal Effects

Mhasawade, Vishwali; Chunara, Rumi (May 2025, Proceedings of Machine Learning Research)

Free, publicly-accessible full text available May 24, 2026
Mitigating Urban-Rural Disparities in Contrastive Representation Learning with Satellite Imagery

https://doi.org/10.1609/aies.v7i1.31760

Zhang, Miao; Chunara, Rumi (October 2024, Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society)

Satellite imagery is being leveraged for many societally critical tasks across climate, economics, and public health. Yet, because of heterogeneity in landscapes (e.g. how a road looks in different places), models can show disparate performance across geographic areas. Given the important potential of disparities in algorithmic systems used in societal contexts, here we consider the risk of urban-rural disparities in identification of land-cover features. This is via semantic segmentation (a common computer vision task in which image regions are labelled according to what is being shown) which uses pre-trained image representations generated via contrastive self-supervised learning. We propose fair dense representation with contrastive learning (FairDCL) as a method for de-biasing the multi-level latent space of a convolution neural network. The method improves feature identification by removing spurious latent representations which are disparately distributed across urban and rural areas, and is achieved in an unsupervised way by contrastive pre-training. The pre-trained image representation mitigates downstream urban-rural prediction disparities and outperforms state-of-the-art baselines on real-world satellite images. Embedding space evaluation and ablation studies further demonstrate FairDCL’s robustness. As generalizability and robustness in geographic imagery is a nascent topic, our work motivates researchers to consider metrics beyond average accuracy in such applications.
more » « less
Full Text Available
Quantifying greenspace with satellite images in Karachi, Pakistan using a new data augmentation paradigm

https://doi.org/10.1145/3716370

Zhang, Miao; Arshad, Hajra; Abbas, Manzar; Jehanzeb, Hamzah; Tahir, Izza; Hassan, Javerya; Samad, Zainab; Chunara, Rumi (February 2025, ACM Journal on Computing and Sustainable Societies)

Greenspaces in communities are critical for mitigating effects of climate change and have important impacts on health. Today, the availability of satellite imagery data combined with deep learning methods allows for automated greenspace analysis at high resolution. We propose a novel green color augmentation for deep learning model training to better detect and delineate types of greenspace (trees, grass) with satellite imagery. Our method outperforms gold standard methods, which use vegetation indices, by 33.1% (accuracy) and 77.7% (intersection-over-union; IoU). The proposed augmentation technique also shows improvement over state-of-the-art deep learning-based methods by 13.4% (IoU) and 3.11% (accuracy) for greenspace segmentation. We apply the method to high-resolution (0.27m/pixel) satellite images covering Karachi, Pakistan and illuminates an important need; Karachi has 4.17m²of greenspace per capita, which significantly lags World Health Organization recommendations. Moreover, greenspaces in Karachi are often in areas of economic development (Pearson’s correlation coefficient shows a 0.352 correlation between greenspaces and roads,p< 0.001), and corresponds to higher land surface temperature in localized areas. Our greenspace analysis and how it relates to infrastructure and climate is relevant to urban planners, public health and government professionals, and ultimately the public, for improved allocation and development of greenspaces.
more » « less
Free, publicly-accessible full text available February 8, 2026
Utilizing big data without domain knowledge impacts public health decision-making

https://doi.org/10.1073/pnas.2402387121

Zhang, Miao; Rahman, Salman; Mhasawade, Vishwali; Chunara, Rumi (September 2024, Proceedings of the National Academy of Sciences)

New data sources and AI methods for extracting information are increasingly abundant and relevant to decision-making across societal applications. A notable example is street view imagery, available in over 100 countries, and purported to inform built environment interventions (e.g., adding sidewalks) for community health outcomes. However, biases can arise when decision-making does not account for data robustness or relies on spurious correlations. To investigate this risk, we analyzed 2.02 million Google Street View (GSV) images alongside health, demographic, and socioeconomic data from New York City. Findings demonstrate robustness challenges; built environment characteristics inferred from GSV labels at the intracity level often do not align with ground truth. Moreover, as average individual-level behavior of physical inactivity significantly mediates the impact of built environment features by census tract, intervention on features measured by GSV would be misestimated without proper model specification and consideration of this mediation mechanism. Using a causal framework accounting for these mediators, we determined that intervening by improving 10% of samples in the two lowest tertiles of physical inactivity would lead to a 4.17 (95% CI 3.84–4.55) or 17.2 (95% CI 14.4–21.3) times greater decrease in the prevalence of obesity or diabetes, respectively, compared to the same proportional intervention on the number of crosswalks by census tract. This study highlights critical issues of robustness and model specification in using emergent data sources, showing the data may not measure what is intended, and ignoring mediators can result in biased intervention effect estimates.
more » « less
Full Text Available
Measures of Disparity and their Efficient Estimation

https://doi.org/10.1145/3600211.3604697

Singh, Harvineet; Chunara, Rumi (August 2023, ACM AIES)
Area-based determinants of outreach vaccination for reaching vulnerable populations: A cross-sectional study in Pakistan

https://doi.org/10.1371/journal.pgph.0001703

Chen, Xiaoting; Porter, Allan; Abdur_Rehman, Nabeel; Morris, Shaun K; Saif, Umar; Chunara, Rumi (September 2023, PLOS Global Public Health)
Singh, Aditya (Ed.)
The objective of this study is to gain a comparative understanding of spatial determinants for outreach and clinic vaccination, which is critical for operationalizing efforts and breaking down structural biases; particularly relevant in countries where resources are low, and sub-region variance is high. Leveraging a massive effort to digitize public system reporting by Lady and Community Health Workers (CHWs) with geo-located data on over 4 million public-sector vaccinations from September 2017 through 2019, understanding health service operations in relation to vulnerable spatial determinants were made feasible. Location and type of vaccinations (clinic or outreach) were compared to regional spatial attributes where they were performed. Important spatial attributes were assessed using three modeling approaches (ridge regression, gradient boosting, and a generalized additive model). Consistent predictors for outreach, clinic, and proportion of third dose pentavalent vaccinations by region were identified. Of all Penta-3 vaccination records, 86.3% were performed by outreach efforts. At the tehsil level (fourth-order administrative unit), controlling for child population, population density, proportion of population in urban areas, distance to cities, average maternal education, and other relevant factors, increased poverty was significantly associated with more in-clinic vaccinations (β = 0.077), and lower proportion of outreach vaccinations by region (β = -0.083). Analyses at the union council level (fifth-administrative unit) showed consistent results for the differential importance of poverty for outreach versus clinic vaccination. Relevant predictors for each type of vaccination (outreach vs. in-clinic) show how design of outreach vaccination can effectively augment vaccination efforts beyond healthcare services through clinics. As Pakistan is third among countries with the most unvaccinated and under-vaccinated children, understanding barriers and factors associated with vaccination can be demonstrative for other national and sub-national regions facing challenges and also inform guidelines on supporting CHWs in health systems.
more » « less
Full Text Available
When do Minimax-fair Learning and Empirical Risk Minimization Coincide?

Singh, Harvineet; Kleindessner, Matthäus; Cevher, Volkan; Chunara, Rumi; Russell, Chris (June 2023, ICML 2023 Poster)

Minimax-fair machine learning minimizes the error for the worst-off group. However, empirical evidence suggests that when sophisticated models are trained with standard empirical risk minimization (ERM), they often have the same performance on the worst-off group as a minimax-trained model. Our work makes this counter-intuitive observation concrete. We prove that if the hypothesis class is sufficiently expressive and the group information is recoverable from the features, ERM and minimax-fairness learning formulations indeed have the same performance on the worst-off group. We provide additional empirical evidence of how this observation holds on a wide range of datasets and hypothesis classes. Since ERM is fundamentally easier than minimax optimization, our findings have implications on the practice of fair machine learning.
more » « less
Full Text Available
When do Minimax-fair Learning and Empirical Risk Minimization Coincide?

Singh, Harvineet; Kleindessner, Matthäus; Cevher, Volkan; Chunara, Rumi; Russell, Chris (April 2023, ICML 2023 Poster)

Minimax-fair machine learning minimizes the error for the worst-off group. However, empirical evidence suggests that when sophisticated models are trained with standard empirical risk minimization (ERM), they often have the same performance on the worst-off group as a minimax-trained model. Our work makes this counter-intuitive observation concrete. We prove that if the hypothesis class is sufficiently expressive and the group information is recoverable from the features, ERM and minimax-fairness learning formulations indeed have the same performance on the worst-off group. We provide additional empirical evidence of how this observation holds on a wide range of datasets and hypothesis classes. Since ERM is fundamentally easier than minimax optimization, our findings have implications on the practice of fair machine learning.
more » « less
Full Text Available
Is there a need for graduate-level programmes in health data science? A perspective from Pakistan

https://doi.org/10.1016/S2214-109X(22)00459-4

Hoodbhoy, Zahra; Chunara, Rumi; Waljee, Akbar; AbuBakr, Amina; Samad, Zainab (January 2023, The Lancet Global Health)

Full Text Available
Data Science in Public Health: Building Next Generation Capacity

https://doi.org/10.1162/99608f92.18da72db

Mirin, Nicholas; Mattie, Heather; Jackson, Latifa; Samad, Zainab; Chunara, Rumi (October 2022, Harvard Data Science Review)

Full Text Available

« Prev Next »

Search for: All records