skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Regression with Archaeological Count Data
Archaeological data often come in the form of counts. Understanding why counts of artifacts, subsistence remains, or features vary across time and space is central to archaeological inquiry. A central statistical method to model such variation is through regression, yet despite sophisticated advances in computational approaches to archaeology, practitioners do not have a standard approach for building, validating, or interpreting the results of count regression. Drawing on advances in ecology, we outline a framework for evaluating regressions with archaeological count data that includes suggestions for model fitting, diagnostics, and interpreting results. We hope these suggestions provide a foundation for advancing regression with archaeological count data to further our understanding of the past.  more » « less
Award ID(s):
2308299
PAR ID:
10548342
Author(s) / Creator(s):
;
Publisher / Repository:
Advances in Archaeological Practice
Date Published:
Journal Name:
Advances in Archaeological Practice
Volume:
12
Issue:
2
ISSN:
2326-3768
Page Range / eLocation ID:
163 to 172
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    The benefit of integrating batches of genomic data to increase statistical power is often hindered by batch effects, or unwanted variation in data caused by differences in technical factors across batches. It is therefore critical to effectively address batch effects in genomic data to overcome these challenges. Many existing methods for batch effects adjustment assume the data follow a continuous, bell-shaped Gaussian distribution. However in RNA-seq studies the data are typically skewed, over-dispersed counts, so this assumption is not appropriate and may lead to erroneous results. Negative binomial regression models have been used previously to better capture the properties of counts. We developed a batch correction method, ComBat-seq, using a negative binomial regression model that retains the integer nature of count data in RNA-seq studies, making the batch adjusted data compatible with common differential expression software packages that require integer counts. We show in realistic simulations that the ComBat-seq adjusted data results in better statistical power and control of false positives in differential expression compared to data adjusted by the other available methods. We further demonstrated in a real data example that ComBat-seq successfully removes batch effects and recovers the biological signal in the data. 
    more » « less
  2. Segata, Nicola (Ed.)
    The understanding of bacterial gene function has been greatly enhanced by recent advancements in the deep sequencing of microbial genomes. Transposon insertion sequencing methods combines next-generation sequencing techniques with transposon mutagenesis for the exploration of the essentiality of genes under different environmental conditions. We propose a model-based method that uses regularized negative binomial regression to estimate the change in transposon insertions attributable to gene-environment changes in this genetic interaction study without transformations or uniform normalization. An empirical Bayes model for estimating the local false discovery rate combines unique and total count information to test for genes that show a statistically significant change in transposon counts. When applied to RB-TnSeq (randomized barcode transposon sequencing) and Tn-seq (transposon sequencing) libraries made in strains of Caulobacter crescentus using both total and unique count data the model was able to identify a set of conditionally beneficial or conditionally detrimental genes for each target condition that shed light on their functions and roles during various stress conditions. 
    more » « less
  3. Researchers increasingly rely on aggregations of radiocarbon dates from archaeological sites as proxies for past human populations. This approach has been critiqued on several grounds, including the assumptions that material is deposited, preserved, and sampled in proportion to past population size. However, various attempts to quantitatively assess the approach suggest there may be some validity in assuming date counts reflect relative population size. To add to this conversation, here we conduct a preliminary analysis coupling estimates of ethnographic population density with late Holocene radiocarbon dates across all counties in California. Results show that counts of late Holocene radiocarbon-dated archaeological sites increase significantly as a function of ethnographic population density. This trend is robust across varying sampling windows over the last 5000 BP. Though the majority of variation in dated-site counts remains unexplained by population density. Outliers reveal how departures from the central trend may be influenced by regional differences in research traditions, development-driven contract work, organic preservation, and landscape taphonomy. Overall, this exercise provides some support for the “dates-as-data” approach and offers insights into the conditions where the underlying assumptions may or may not hold. 
    more » « less
  4. Abstract BackgroundSingle-cell RNA-sequencing (scRNA-seq) technologies allow for the study of gene expression in individual cells. Often, it is of interest to understand how transcriptional activity is associated with cell-specific covariates, such as cell type, genotype, or measures of cell health. Traditional approaches for this type of association mapping assume independence between the outcome variables (or genes), and perform a separate regression for each. However, these methods are computationally costly and ignore the substantial correlation structure of gene expression. Furthermore, count-based scRNA-seq data pose challenges for traditional models based on Gaussian assumptions. ResultsWe aim to resolve these issues by developing a reduced-rank regression model that identifies low-dimensional linear associations between a large number of cell-specific covariates and high-dimensional gene expression readouts. Our probabilistic model uses a Poisson likelihood in order to account for the unique structure of scRNA-seq counts. We demonstrate the performance of our model using simulations, and we apply our model to a scRNA-seq dataset, a spatial gene expression dataset, and a bulk RNA-seq dataset to show its behavior in three distinct analyses. ConclusionWe show that our statistical modeling approach, which is based on reduced-rank regression, captures associations between gene expression and cell- and sample-specific covariates by leveraging low-dimensional representations of transcriptional states. 
    more » « less
  5. Abstract Most of the current public health surveillance methods used in epidemiological studies to identify hotspots of diseases assume that the regional disease case counts are independently distributed and they lack the ability of adjusting for confounding covariates. This article proposes a new approach that uses a simultaneous autoregressive (SAR) model, a popular spatial regression approach, within the classical space‐time cumulative sum (CUSUM) framework for detecting changes in the spatial distribution of count data while accounting for risk factors and spatial correlation. We develop expressions for the likelihood ratio test monitoring statistics based on a SAR model with covariates, leading to the proposed space‐time CUSUM test statistic. The effectiveness of the proposed monitoring approach in detecting and identifying step shifts is studied by simulation of various shift scenarios in regional counts. A case study for monitoring regional COVID‐19 infection counts while adjusting for social vulnerability, often correlated with a community's susceptibility towards disease infection, is presented to illustrate the application of the proposed methodology in public health surveillance. 
    more » « less