Traditional cancer rate estimations are often limited in spatial resolutions and lack considerations of environmental factors. Satellite imagery has become a vital data source for monitoring diverse urban environments, supporting applications across environmental, socio-demographic, and public health domains. However, while deep learning (DL) tools, particularly convolutional neural networks, have demonstrated strong performance in extracting features from high-resolution imagery, their reliance on local spatial cues often limits their ability to capture complex, non-local, and higher-order structural information. To overcome this limitation, we propose a novel LLM-based multi-agent coordination system for satellite image analysis, which integrates visual and contextual reasoning through a simplicial contrastive learning framework (Agent- SNN). Our Agent-SNN contains two augmented superpixel-based graphs and maximizes mutual information between their latent simplicial complex representations, thereby enabling the system to learn both local and global topological features. The LLM-based agents generate structured prompts that guide the alignment of these representations across modalities. Experiments with satellite imagery of Los Angeles and San Diego demonstrate that Agent-SNN achieves signi cant improvements over state-of-the-art baselines in regional cancer prevalence estimation tasks. 
                        more » 
                        « less   
                    This content will become publicly available on April 6, 2026
                            
                            Fusing Multimodality of Large Language Models and Satellite Imagery via Simplicial Contrastive Learning for Latent Urban Feature Identification and Environmental Application
                        
                    
    
            Satellite imagery is a readily available data source for monitoring a broad range of urban geographical contexts related to environmental, socio-demographic, and health disparities. To analyze satellite images, deep learning (DL) tools efficiently extract latent multi-dimensional characteristics, beyond identifying specific urban elements like roads and houses. However, current DL approaches tend to largely rely on Convolutional Neural Networks applied to high-resolution imagery, and as such may be limited to capturing only local contextual information. To address this fundamental limitation, we propose to fuse the modalities of satellite imagery and a large language model (LLM). In particular, we develop a novel LLM-based Simplicial Contrastive Learning model (LLM-SCL) based on mutual information maximization between the latent simplicial complex-level representations of two kinds of augmented (superpixel) graphs, which allows for cohesive integration of LLM prompts and learning of both local and global higher-order properties of satellite imagery (from all pixels in an image). Extensive experiments on satellite imagery at several resolutions in Tijuana, Mexico, Los Angeles and San Diego, USA, suggest that LLM-SCL significantly outperforms state-of-the-art baselines on unsupervised image classification tasks. As such, the proposed LLM-SCL opens a new path for more accurate evaluations of latent urban forms and their associations with environmental and health outcome disparities. 
        more » 
        « less   
        
    
    
                            - PAR ID:
- 10639308
- Publisher / Repository:
- IEEE
- Date Published:
- Page Range / eLocation ID:
- 1 to 5
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Satellite imagery is being leveraged for many societally critical tasks across climate, economics, and public health. Yet, because of heterogeneity in landscapes (e.g. how a road looks in different places), models can show disparate performance across geographic areas. Given the important potential of disparities in algorithmic systems used in societal contexts, here we consider the risk of urban-rural disparities in identification of land-cover features. This is via semantic segmentation (a common computer vision task in which image regions are labelled according to what is being shown) which uses pre-trained image representations generated via contrastive self-supervised learning. We propose fair dense representation with contrastive learning (FairDCL) as a method for de-biasing the multi-level latent space of a convolution neural network. The method improves feature identification by removing spurious latent representations which are disparately distributed across urban and rural areas, and is achieved in an unsupervised way by contrastive pre-training. The pre-trained image representation mitigates downstream urban-rural prediction disparities and outperforms state-of-the-art baselines on real-world satellite images. Embedding space evaluation and ablation studies further demonstrate FairDCL’s robustness. As generalizability and robustness in geographic imagery is a nascent topic, our work motivates researchers to consider metrics beyond average accuracy in such applications.more » « less
- 
            Poverty maps derived from satellite imagery are increasingly used to inform high-stakes policy decisions, such as the allocation of humanitarian aid and the distribution of government resources. Such poverty maps are typically constructed by training machine learning algorithms on a relatively modest amount of “ground truth” data from surveys, and then predicting poverty levels in areas where imagery exists but surveys do not. Using survey and satellite data from ten countries, this paper investigates disparities in representation, systematic biases in prediction errors, and fairness concerns in satellite-based poverty mapping across urban and rural lines, and shows how these phenomena affect the validity of policies based on predicted maps. Our findings highlight the importance of careful error and bias analysis before using satellite-based poverty maps in real-world policy decisions.more » « less
- 
            Poverty maps derived from satellite imagery are increasingly used to inform high-stakes policy decisions, such as the allocation of humanitarian aid and the distribution of government resources. Such poverty maps are typically constructed by training machine learning algorithms on a relatively modest amount of “ground truth” data from surveys, and then predicting poverty levels in areas where imagery exists but surveys do not. Using survey and satellite data from ten countries, this paper investigates disparities in representation, systematic biases in prediction errors, and fairness concerns in satellite-based poverty mapping across urban and rural lines, and shows how these phenomena affect the validity of policies based on predicted maps. Our findings highlight the importance of careful error and bias analysis before using satellite-based poverty maps in real-world policy decisions.more » « less
- 
            Automatic satellite-based reconstruction enables large and widespread creation of urban areas. However, satellite imagery is often noisy and incomplete, and is not suitable for reconstructing detailed building facades. We present a machine learning-based inverse procedural modeling method to automatically create synthetic facades from satellite imagery. Our key observation is that building facades exhibit regular, grid-like structures. Hence, we can overcome the low-resolution, noisy, and partial building data obtained from satellite imagery by synthesizing the underlying facade layout. Our method infers regular facade details from satellite-based image-fragments of a building, and applies them to occluded or under-sampled parts of the building, resulting in plausible, crisp facades. Using urban areas from six cities, we compare our approach to several state-of-the-art image completion/in-filling methods and our approach consistently creates better facade images.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
