skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Integration of statistical and administrative agricultural data from Namibia
Statistical and administrative agencies often collect information on related parameters. Discrepancies between estimates from distinct data sources can arise due to differences in definitions, reference periods, and data collection protocols. Integrating statistical data with administrative data is appealing for saving data collection costs, reducing respondent burden, and improving the coherence of estimates produced by statistical and administrative agencies. Model based techniques, such as small area estimation and measurement error models, for combining multiple data sources have benefits of transparency, reproducibility, and the ability to provide an estimated uncertainty. Issues associated with integrating statistical data with administrative data are discussed in the context of data from Namibia. The national statistical agency in Namibia produces estimates of crop area using data from probability samples. Simultaneously, the Namibia Ministry of Agriculture, Water, and Forestry obtains crop area estimates through extension programs. We illustrate the use of a structural measurement error model for the purpose of synthesizing the administrative and survey data to form a unified estimate of crop area. Limitations on the available data preclude us from conducting a genuine, thorough application. Nonetheless, our illustration of methodology holds potential use for a general practitioner.  more » « less
Award ID(s):
1733572
PAR ID:
10297274
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Statistical Journal of the IAOS
Volume:
37
Issue:
2
ISSN:
1874-7655
Page Range / eLocation ID:
557 to 578
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Precise monitoring of individual crop growth and health status is crucial for precision agriculture practices. However, traditional inspection methods are time-consuming, labor-intensive, prone to human error, and may not provide the comprehensive coverage required for the detailed analysis of crop variability across an entire field. This research addresses the need for efficient and high-resolution crop monitoring by leveraging Unmanned Aerial Vehicle (UAV) imagery and advanced computational techniques. The primary goal was to develop a methodology for the precise identification, extraction, and monitoring of individual corn crops throughout their growth cycle. This involved integrating UAV-derived data with image processing, computational geometry, and machine learning techniques. Bi-weekly UAV imagery was captured at altitudes of 40 m and 70 m from 30 April to 11 August, covering the entire growth cycle of the corn crop from planting to harvest. A time-series Canopy Height Model (CHM) was generated by analyzing the differences between the Digital Terrain Model (DTM) and the Digital Surface Model (DSM) derived from the UAV data. To ensure the accuracy of the elevation data, the DSM was validated against Ground Control Points (GCPs), adhering to standard practices in remote sensing data verification. Local spatial analysis and image processing techniques were employed to determine the local maximum height of each crop. Subsequently, a Voronoi data model was developed to delineate individual crop canopies, successfully identifying 13,000 out of 13,050 corn crops in the study area. To enhance accuracy in canopy size delineation, vegetation indices were incorporated into the Voronoi model segmentation, refining the initial canopy area estimates by eliminating interference from soil and shadows. The proposed methodology enables the precise estimation and monitoring of crop canopy size, height, biomass reduction, lodging, and stunted growth over time by incorporating advanced image processing techniques and integrating metrics for quantitative assessment of fields. Additionally, machine learning models were employed to determine relationships between the canopy sizes, crop height, and normalized difference vegetation index, with Polynomial Regression recording an R-squared of 11% compared to other models. This work contributes to the scientific community by demonstrating the potential of integrating UAV technology, computational geometry, and machine learning for accurate and efficient crop monitoring at the individual plant level. 
    more » « less
  2. Abstract In small area estimation, different data sources are integrated in order to produce reliable estimates of target parameters (e.g., a mean or a proportion) for a collection of small subsets (areas) of a finite population. Regression models such as the linear mixed effects model or M-quantile regression are often used to improve the precision of survey sample estimates by leveraging auxiliary information for which means or totals are known at the area level. In many applications, the unit-level linkage of records from different sources is probabilistic and potentially error-prone. In this article, we present adjustments of the small area predictors that are based on either the linear mixed effects model or M-quantile regression to account for the presence of linkage error. These adjustments are developed from a two-component mixture model that hinges on the assumption of independence of the target and auxiliary variable given incorrect linkage. Estimation and inference is based on composite likelihoods and machinery revolving around the Expectation-Maximization Algorithm. For each of the two regression methods, we propose modified small area predictors and approximations for their mean squared errors. The empirical performance of the proposed approaches is studied in both design-based and model-based simulations that include comparisons to a variety of baselines. 
    more » « less
  3. ABSTRACT Probability surveys are challenged by increasing nonresponse rates, resulting in biased statistical inference. Auxiliary information about populations can be used to reduce bias in estimation. Often continuous auxiliary variables in administrative records are first discretized before releasing to the public to avoid confidentiality breaches. This may weaken the utility of the administrative records in improving survey estimates, particularly when there is a strong relationship between continuous auxiliary information and the survey outcome. In this paper, we propose a two‐step strategy, where the confidential continuous auxiliary data in the population are first utilized to estimate the response propensity score of the survey sample by statistical agencies, which is then included in a modified population data for data users. In the second step, data users who do not have access to confidential continuous auxiliary data conduct predictive survey inference by including discretized continuous variables and the propensity score as predictors using splines in a Bayesian model. We show by simulation that the proposed method performs well, yielding more efficient estimates of population means with 95% credible intervals providing better coverage than alternative approaches. We illustrate the proposed method using the Ohio Army National Guard Mental Health Initiative (OHARNG‐MHI). The methods developed in this work are readily available in the R packageAuxSurvey. 
    more » « less
  4. Abstract Human biomechanical data are often accompanied with measurement noise and behavioral variability. Errors due to such noise and variability are usually exaggerated by fewer trials or shorter trial durations and could be reduced using more trials or longer trial durations. Speeding up such data collection by lowering number of trials or trial duration, while improving the accuracy of statistical estimates, would be of particular interest in wearable robotics applications and when the human population studied is vulnerable (e.g., the elderly). Here, we propose the use of the James–Stein estimator (JSE) to improve statistical estimates with a given amount of data or reduce the amount of data needed for a given accuracy. The JSE is a shrinkage estimator that produces a uniform reduction in the summed squared errors when compared with the more familiar maximum likelihood estimator (MLE), simple averages, or other least squares regressions. When data from multiple human participants are available, an individual participant’s JSE can improve upon MLE by incorporating information from all participants, improving overall estimation accuracy on average. Here, we apply the JSE to multiple time series of kinematic and metabolic data from the following parameter estimation problems: foot placement control during level walking, energy expenditure during circle walking, and energy expenditure during resting. We show that the resulting estimates improve accuracy—that is, the James–Stein estimates have lower summed squared error from the ‘true’ value compared with more conventional estimates. 
    more » « less
  5. Abstract Often, government agencies and survey organizations know the population counts or percentages for some of the variables in a survey. These may be available from auxiliary sources, for example administrative databases or other high-quality surveys. We present and illustrate a model-based framework for leveraging such auxiliary marginal information when handling unit and item nonresponse. We show how one can use the margins to specify different missingness mechanisms for each type of nonresponse. We use the framework to impute missing values in voter turnout in a subset of data from the US Current Population Survey. In doing so, we examine the sensitivity of results to different assumptions about the unit and item nonresponse. 
    more » « less