skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A machine learning‐based analysis of 311 requests in the Miami‐Dade County
This paper illustrates the application of machine learning algorithms in predictive analytics for local governments using administrative data. The developed and tested machine learning predictive algorithm overcomes known limitations of the conventional ordinary least squares method. Such limitations include but not limited to imposed linearity, presumed causality with independent variables as presumed causes and dependent variables as presume result, likely high multicollinearity among features, and spatial autocorrelation. The study applies the algorithms to 311 non-emergency service requests in the context of Miami-Dade County. The algorithms are applied to predict the volume of 311 service requests and the community characteristics affecting the volume across Census tract neighborhoods. Four common families of algorithms and an ensemble of them are applied. They are random forest, support vector machines, lasso and elastic-net regularized generalized linear models, and extreme gradient boosting. Two feature selection methods, namely Boruta and fscaret, are applied to identify the significant community characteristics. The results show that the machine learning algorithms capture spatial autocorrelation and clustering. The features generated by fscaret algorithms are parsimonious in predicting the 311 service request volume.  more » « less
Award ID(s):
1924154
PAR ID:
10344744
Author(s) / Creator(s):
; ; ;
Editor(s):
Carruthers, John; Duncan, Natasha; He, Canfei; Zhu, Shengjun
Date Published:
Journal Name:
Growth and Change
Volume:
0
ISSN:
0017-4815
Page Range / eLocation ID:
1-19
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    The main purpose of this paper is to illustrate the application of causal inference method to administrative data and the challenges of such application. We illustrate by applying Bayesian networks method to 311 data from Miami-Dade County, Florida (USA). The 311 centers provide non-emergency services to residents. The 311 data are large and granular. We aim to explore the equity issues and biases that might exist in this particular type of service requests. As a case study, the relationship between population characteristics (independent variables) and request volume and completion time (dependent variables) is examined to identify the disparities, if any, from the observational data. The empirical analysis shows that there are no biases in services provided to any specific demographic, socioeconomic, or geographical groups. However, the administrative data do have various challenges for inferring causality due to missing or impure data, inadequacy, and latent confounders. The precautions of applying causal techniques to analyzing administrative data like 311 are discussed. 
    more » « less
  2. Penkert, B; Hellingrath, B; Rode, M; Widera, A; Middelhoff, M; Boersma, K; Kalthoner, M (Ed.)
    This paper introduces a machine learning tool for service systems, focusing on accurate classification of service requests and swift anomaly detection, particularly crucial during emergencies. Employing a Support Vector Machine model, this tool automatically classifies service calls into predefined categories with high accuracy, while effectively detecting irregular requests that require specific attention from operators. This approach streamlines resource management by reducing the manuaI categorization workload and enables early detection of emerging service needs. Examining Orange County, Florida 311 System data, with a specific focus on the COVID-19 period, we illustrate the tool's success in automatic request categorization and anomaly detection. Overall, this tool presents an effective automation approach to help with efficient resource management of service systems and proactive assessment of public service needs, promising to revolutionize service request management during crises. Future work will explore additional classification models for enhanced accuracy and integrate automated alerts for proactive disaster management. 
    more » « less
  3. Abstract Predicting the edges of species distributions is fundamental for species conservation, ecosystem services, and management decisions. In North America, the location of the upstream limit of fish in forested streams receives special attention, because fish-bearing portions of streams have more protections during forest management activities than fishless portions. We present a novel model development and evaluation framework, wherein we compare 26 models to predict upper distribution limits of trout in streams. The models used machine learning, logistic regression, and a sophisticated nested spatial cross-validation routine to evaluate predictive performance while accounting for spatial autocorrelation. The model resulting in the best predictive performance, termed UPstream Regional LiDAR Model for Extent of Trout (UPRLIMET), is a two-stage model that uses a logistic regression algorithm calibrated to observations of Coastal Cutthroat Trout ( Oncorhynchus clarkii clarkii ) occurrence and variables representing hydro-topographic characteristics of the landscape. We predict trout presence along reaches throughout a stream network, and include a stopping rule to identify a discrete upper limit point above which all stream reaches are classified as fishless. Although there is no simple explanation for the upper distribution limit identified in UPRLIMET, four factors, including upstream channel length above the point of uppermost fish, drainage area, slope, and elevation, had highest importance. Across our study region of western Oregon, we found that more of the fish-bearing network is on private lands than on state, US Bureau of Land Mangement (BLM), or USDA Forest Service (USFS) lands, highlighting the importance of using spatially consistent maps across a region and working across land ownerships. Our research underscores the value of using occurrence data to develop simple, but powerful, prediction tools to capture complex ecological processes that contribute to distribution limits of species. 
    more » « less
  4. Abstract The importance and complexity of spatial join operation resulted in the availability of many join algorithms, some of which are tailored for big-data platforms like Hadoop and Spark. The choice among them is not trivial and depends on different factors. This paper proposes the first machine-learning-based framework for spatial join query optimization which can accommodate both the characteristics of spatial datasets and the complexity of the different algorithms. The main challenge is how to develop portable cost models that once trained can be applied to any pair of input datasets, because they are able to extract the important input characteristics, such as data distribution and spatial partitioning, the logic of spatial join algorithms, and the relationship between the two input datasets. The proposed system defines a set of features that can be computed efficiently for the data to catch the intricate aspects of spatial join. Then, it uses these features to train five machine learning models that are used to identify the best spatial join algorithm. The first two are regression models that estimate two important measures of the spatial join performance and they act as the cost model. The third model chooses the best partitioning strategy to use with spatial join. The fourth and fifth models further tune two important parameters, number of partitions and plane-sweep direction, to get the best performance. Experiments on large-scale synthetic and real data show the efficiency of the proposed models over baseline methods. 
    more » « less
  5. Abstract Spatial models for occupancy data are used to estimate and map the true presence of a species, which may depend on biotic and abiotic factors as well as spatial autocorrelation. Traditionally researchers have accounted for spatial autocorrelation in occupancy data by using a correlated normally distributed site‐level random effect, which might be incapable of modeling nontraditional spatial dependence such as discontinuities and abrupt transitions. Machine learning approaches have the potential to model nontraditional spatial dependence, but these approaches do not account for observer errors such as false absences. By combining the flexibility of Bayesian hierarchal modeling and machine learning approaches, we present a general framework to model occupancy data that accounts for both traditional and nontraditional spatial dependence as well as false absences. We demonstrate our framework using six synthetic occupancy data sets and two real data sets. Our results demonstrate how to model both traditional and nontraditional spatial dependence in occupancy data, which enables a broader class of spatial occupancy models that can be used to improve predictive accuracy and model adequacy. 
    more » « less