skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Six Statistical Senses
This article proposes a set of categories, each one representing a particular distillation of important statistical ideas. Each category is labeled a “sense” because we think of these as essential in helping every statistical mind connect in constructive and insightful ways with statistical theory, methodologies, and computation, toward the ultimate goal of building statistical phronesis. The illustration of each sense with statistical principles and methods provides a sensical tour of the conceptual landscape of statistics, as a leading discipline in the data science ecosystem. Expected final online publication date for the Annual Review of Statistics and Its Application, Volume 10 is March 2023. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.  more » « less
Award ID(s):
1812063
PAR ID:
10390267
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Annual Review of Statistics and Its Application
Volume:
10
Issue:
1
ISSN:
2326-8298
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. How can we make sense of large-scale recordings of neural activity across learning? Theories of neural network learning with their origins in statistical physics offer a potential answer: for a given task, there are often a small set of summary statistics that are sufficient to predict performance as the network learns. Here, we review recent advances in how summary statistics can be used to build theoretical understanding of neural network learning. We then argue for how this perspective can inform the analysis of neural data, enabling better understanding of learning in biological and artificial neural networks. 
    more » « less
  2. A bstract It is widely expected that systems which fully thermalize are chaotic in the sense of exhibiting random-matrix statistics of their energy level spacings, whereas integrable systems exhibit Poissonian statistics. In this paper, we investigate a third class: spin glasses. These systems are partially chaotic but do not achieve full thermalization due to large free energy barriers. We examine the level spacing statistics of a canonical infinite-range quantum spin glass, the quantum p -spherical model, using an analytic path integral approach. We find statistics consistent with a direct sum of independent random matrices, and show that the number of such matrices is equal to the number of distinct metastable configurations — the exponential of the spin glass “complexity” as obtained from the quantum Thouless-Anderson-Palmer equations. We also consider the statistical properties of the complexity itself and identify a set of contributions to the path integral which suggest a Poissonian distribution for the number of metastable configurations. Our results show that level spacing statistics can probe the ergodicity-breaking in quantum spin glasses and provide a way to generalize the notion of spin glass complexity beyond models with a semi-classical limit. 
    more » « less
  3. Tabulated statistics of road networks at the level of intersections and for built-up areas for each decade from 1900 to 2010, and for 2015, for each core-based statistical area (CBSA, i.e., metropolitan and micropolitan statistical area) in the conterminous United States. These areas are derived from historical road networks developed by Johannes Uhl. See Burghardt et al. (2022) for details on the data processing.  Spatial coverage: all CBSAs that are covered by the HISDAC-US historical settlement layers. This dataset includes around 2,700 U.S. counties. In the remaining counties, construction year coverage in the underlying ZTRAX data (Zillow Transaction and Assessment Dataset) is low. See Uhl et al. (2021) for details. All data created by Keith A. Burghardt, USC Information Sciences Institute, USA Codebook: these CBSA statistics are stratified by degree of aggregation. - CBSA_stats_diffFrom1950: Change in CBSA-aggregated patch statistics between 1950 and 2015 - CBSA_stats_by_decade: CBSA-aggregated patch statistics for each decade from 1900-2010 plus 2015 - CBSA_stats_by_decade: CBSA-aggregated cumulative patch statistics for each decade from 1900-2010 plus 2015. All roads created up to a given decade are used for calculating statistics. - Patch_stats_by_decade: Individual patch statistics for each decade from 1900-2010 plus 2015 - Patch_stats_by_decade: Individual cumulative patch statistics for each decade from 1900-2010 plus 2015. All roads created up to a given decade are used for calculating statistics. The statistics are the following: msaid: CBSA codeid: (if patch statistics) arbitrary int unique to each patch within the CBSA that yearyear: year of statisticspop: population within all CBSA countiespatch_bupr: built up property records (BUPR) within a patch (or sum of patches within CBSA)patch_bupl: built up property l (BUPL) within a patch (or sum of patches within CBSA)patch_bua: built up area (BUA) within a patch (or sum of patches within CBSA)all_bupr: Same as above but for all data in 2015 regardless of whether properties were in patchesall_bupl: Same as above but for all data in 2015 regardless of whether properties were in patchesall_bua: Same as above but for all data in 2015 regardless of whether properties were in patchesnum_nodes: number of nodes (intersections)num_edges: number of edges (roads between intersections)distance: total road length in kmk_mean: mean number of undirected roads per intersectionk1: fraction of nodes with degree 1k4plus: fraction of nodes with degree 4+bearing: histogram of different bearings between intersectionsentropy: entropy of bearing histogrammean_local_gridness: Griddedness used in textmean_local_gridness_max: Same as griddedness used in text but assumes we can have up to 3 quadrilaterals for degree 3 (maximum possible, although intersections will not necessarily create right angles) Code available at https://github.com/johannesuhl/USRoadNetworkEvolution. References: Burghardt, K., Uhl, J., Lerman, K.,  & Leyk, S. (2022). Road Network Evolution in the Urban and Rural  United States Since 1900. Computers, Environment and Urban Systems. 
    more » « less
  4. Algebraic statistics uses tools from algebra (especially from multilinear algebra, commutative algebra, and computational algebra), geometry, and combinatorics to provide insight into knotty problems in mathematical statistics. In this review, we illustrate this on three problems related to networks: network models for relational data, causal structure discovery, and phylogenetics. For each problem, we give an overview of recent results in algebraic statistics, with emphasis on the statistical achievements made possible by these tools and their practical relevance for applications to other scientific disciplines. 
    more » « less
  5. The inertial subrange of turbulent scales is commonly reflected by a power law signature in ensemble statistics such as the energy spectrum and structure functions – both in theory and from observations. Despite promising findings on the topic of fractal geometries in turbulence, there is no accepted image for the physical flow features corresponding to this statistical signature in the inertial subrange. The present study uses boundary layer turbulence measurements to evaluate the self-similar geometric properties of velocity isosurfaces and investigate their influence on statistics for the velocity signal. The fractal dimension of streamwise velocity isosurfaces, indicating statistical self-similarity in the size of ‘wrinkles’ along each isosurface, is shown to be constant only within the inertial subrange of scales. For the transition between the inertial subrange and production range, it is inferred that the largest wrinkles become increasingly confined by the overall size of large-scale coherent velocity regions such as uniform momentum zones. The self-similarity of isosurfaces yields power-law trends in subsequent one-dimensional statistics. For instance, the theoretical 2/3 power-law exponent for the structure function can be recovered by considering the collective behaviour of numerous isosurface level sets. The results suggest that the physical presence of inertial subrange eddies is manifested in the self-similar wrinkles of isosurfaces. 
    more » « less