skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: An evaluation framework for predictive models of neighbourhood change with applications to predicting residential sales in Buffalo, NY
New data and technologies, in particular machine learning, may make it possible to forecast neighbourhood change. Doing so may help, for example, to prevent the negative impacts of gentrification on marginalised communities. However, predictive models of neighbourhood change face four challenges: accuracy (are they right?), granularity (are they right at spatial or temporal scales that actually matter for a policy response?), bias (are they equitable?) and expert validity (do models and their predictions make sense to domain experts?). The present work provides a framework to evaluate the performance of predictive models of neighbourhood change along these four dimensions. We illustrate the application of our evaluation framework via a case study of Buffalo, NY, where we consider the following prediction task: given historical data, can we predict the percentage of residential buildings that will be sold or foreclosed on in a given area over a fixed amount of time into the future?  more » « less
Award ID(s):
1939579
PAR ID:
10448169
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  ;  
Publisher / Repository:
SAGE Publications
Date Published:
Journal Name:
Urban Studies
Volume:
61
Issue:
5
ISSN:
0042-0980
Format(s):
Medium: X Size: p. 838-858
Size(s):
p. 838-858
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Objective:Comprehensive studies examining longitudinal predictors of dietary change during the coronavirus disease 2019 pandemic are lacking. Based on an ecological framework, this study used longitudinal data to test if individual, social and environmental factors predicted change in dietary intake during the peak of the coronavirus 2019 pandemic in Los Angeles County and examined interactions among the multilevel predictors. Design:We analysed two survey waves (e.g. baseline and follow-up) of the Understanding America Study, administered online to the same participants 3 months apart. The surveys assessed dietary intake and individual, social, and neighbourhood factors potentially associated with diet. Lagged multilevel regression models were used to predict change from baseline to follow-up in daily servings of fruits, vegetables and sugar-sweetened beverages. Setting:Data were collected in October 2020 and January 2021, during the peak of the coronavirus disease 2019 pandemic in Los Angeles County. Participants:903 adults representative of Los Angeles County households. Results:Individuals who had depression and less education or who identified as non-Hispanic Black or Hispanic reported unhealthy dietary changes over the study period. Individuals with smaller social networks, especially low-income individuals with smaller networks, also reported unhealthy dietary changes. After accounting for individual and social factors, neighbourhood factors were generally not associated with dietary change. Conclusions:Given poor diets are a leading cause of death in the USA, addressing ecological risk factors that put some segments of the community at risk for unhealthy dietary changes during a crisis should be a priority for health interventions and policy. 
    more » « less
  2. Many visual analytics systems allow users to interact with machine learning models towards the goals of data exploration and insight generation on a given dataset. However, in some situations, insights may be less important than the production of an accurate predictive model for future use. In that case, users are more interested in generating of diverse and robust predictive models, verifying their performance on holdout data, and selecting the most suitable model for their usage scenario. In this paper, we consider the concept of Exploratory Model Analysis (EMA), which is defined as the process of discovering and selecting relevant models that can be used to make predictions on a data source. We delineate the differences between EMA and the well‐known term exploratory data analysis in terms of the desired outcome of the analytic process: insights into the data or a set of deployable models. The contributions of this work are a visual analytics system workflow for EMA, a user study, and two use cases validating the effectiveness of the workflow. We found that our system workflow enabled users to generate complex models, to assess them for various qualities, and to select the most relevant model for their task. 
    more » « less
  3. Beiko, Robert G (Ed.)
    ABSTRACT Inflammatory bowel disease (IBD) is characterized by complex etiology and a disrupted colonic ecosystem. We provide a framework for the analysis of multi-omic data, which we apply to study the gut ecosystem in IBD. Specifically, we train and validate models using data on the metagenome, metatranscriptome, virome, and metabolome from the Human Microbiome Project 2 IBD multi-omic database, with 1,785 repeated samples from 130 individuals (103 cases and 27 controls). After splitting the participants into training and testing groups, we used mixed-effects least absolute shrinkage and selection operator regression to select features for each omic. These features, with demographic covariates, were used to generate separate single-omic prediction scores. All four single-omic scores were then combined into a final regression to assess the relative importance of the individual omics and the predictive benefits when considered together. We identified several species, pathways, and metabolites known to be associated with IBD risk, and we explored the connections between data sets. Individually, metabolomic and viromic scores were more predictive than metagenomics or metatranscriptomics, and when all four scores were combined, we predicted disease diagnosis with a Nagelkerke’sR2of 0.46 and an area under the curve of 0.80 (95% confidence interval: 0.63, 0.98). Our work supports that some single-omic models for complex traits are more predictive than others, that incorporating multiple omic data sets may improve prediction, and that each omic data type provides a combination of unique and redundant information. This modeling framework can be extended to other complex traits and multi-omic data sets. IMPORTANCEComplex traits are characterized by many biological and environmental factors, such that multi-omic data sets are well-positioned to help us understand their underlying etiologies. We applied a prediction framework across multiple omics (metagenomics, metatranscriptomics, metabolomics, and viromics) from the gut ecosystem to predict inflammatory bowel disease (IBD) diagnosis. The predicted scores from our models highlighted key features and allowed us to compare the relative utility of each omic data set in single-omic versus multi-omic models. Our results emphasized the importance of metabolomics and viromics over metagenomics and metatranscriptomics for predicting IBD status. The greater predictive capability of metabolomics and viromics is likely because these omics serve as markers of lifestyle factors such as diet. This study provides a modeling framework for multi-omic data, and our results show the utility of combining multiple omic data types to disentangle complex disease etiologies and biological signatures. 
    more » « less
  4. Abstract Species distribution models (SDMs) have become increasingly popular for making ecological inferences, as well as predictions to inform conservation and management. In predictive modeling, practitioners often use correlative SDMs that only evaluate a single spatial scale and do not account for differences in life stages. These modeling decisions may limit the performance of SDMs beyond the study region or sampling period. Given the increasing desire to develop transferable SDMs, a robust framework is necessary that can account for known challenges of model transferability. Here, we propose a comparative framework to develop transferable SDMs, which was tested using satellite telemetry data from green turtles (Chelonia mydas). This framework is characterized by a set of steps comparing among different models based on (1) model algorithm (e.g., generalized linear model vs. Gaussian process regression) and formulation (e.g., correlative model vs. hybrid model), (2) spatial scale, and (3) accounting for life stage. SDMs were fitted as resource selection functions and trained on data from the Gulf of Mexico with bathymetric depth, net primary productivity, and sea surface temperature as covariates. Independent validation datasets from Brazil and Qatar were used to assess model transferability. A correlative SDM using a hierarchical Gaussian process regression (HGPR) algorithm exhibited greater transferability than a hybrid SDM using HGPR, as well as correlative and hybrid forms of hierarchical generalized linear models. Additionally, models that evaluated habitat selection at the finest spatial scale and that did not account for life stage proved to be the most transferable in this study. The comparative framework presented here may be applied to a variety of species, ecological datasets (e.g., presence‐only, presence‐absence, mark‐recapture), and modeling frameworks (e.g., resource selection functions, step selection functions, occupancy models) to generate transferable predictions of species–habitat associations. We expect that SDM predictions resulting from this comparative framework will be more informative management tools and may be used to more accurately assess climate change impacts on a wide array of taxa. 
    more » « less
  5. In today's world, AI systems need to make sense of large amounts of data as it unfolds in real-time, whether it's a video from surveillance and monitoring cameras, streams of egocentric footage, or sequences in other domains such as text or audio. The ability to break these continuous data streams into meaningful events, discover nested structures, and predict what might happen next at different levels of abstraction is crucial for applications ranging from passive surveillance systems to sensory-motor autonomous learning. However, most existing models rely heavily on large, annotated datasets with fixed data distributions and offline epoch-based training, which makes them impractical for handling the unpredictability and scale of dynamic real-world environments. This dissertation tackles these challenges by introducing a set of predictive models designed to process streaming data efficiently, segment events, and build sequential memory models without supervision or data storage. First, we present a single-layer predictive model that segments long, unstructured video streams by detecting temporal events and spatially localizing objects in each frame. The model is applied to wildlife monitoring footage, where it processes continuous, high-frame-rate video and successfully detects and tracks events without supervision. It operates in an online streaming manner to perform simultaneous training and inference without storing or revisiting the processed data. This approach alleviates the need for manual labeling, making it ideal for handling long-duration, real-world video footage. Building on this, we introduce STREAMER, a multi-layered architecture that extends the single-layer model into a hierarchical predictive framework. STREAMER segments events at different levels of abstraction, capturing the compositional structure of activities in egocentric videos. By dynamically adapting to various timescales, it creates a hierarchy of nested events and forms more complex and abstract representations of the input data. Finally, we propose the Predictive Attractor Model (PAM), which builds biologically plausible memory models of sequential data. Inspired by neuroscience, PAM uses sparse distributed representations and local learning rules to avoid catastrophic forgetting, allowing it to continually learn and make predictions without overwriting previous knowledge. Unlike many traditional models, PAM can generate multiple potential future outcomes conditioned on the same context, which allows for handling uncertainty in generative tasks. Together, these models form a unified framework of predictive learning that addresses multiple challenges in event understanding and temporal data analyses. By using prediction as the core mechanism, they segment continuous data streams into events, discover hierarchical structures across multiple levels of abstraction, learn semantic event representations, and model sequences without catastrophic forgetting. 
    more » « less