Abstract ESA and NASA are moving forward with plans to launch LISA around 2034. With data from the Illustris cosmological simulation, we provide analysis of LISA detection rates accompanied by characterization of the merging massive black hole population. Massive black holes of total mass ∼105 − 1010M⊙ are the focus of this study. We evolve Illustris massive black hole mergers, which form at separations on the order of the simulation resolution (∼kpc scales), through coalescence with two different treatments for the binary massive black hole evolutionary process. The coalescence times of the population, as well as physical properties of the black holes, form a statistical basis for each evolutionary treatment. From these bases, we Monte Carlo synthesize many realizations of the merging massive black hole population to build mock LISA detection catalogs. We analyze how our massive black hole binary evolutionary models affect detection rates and the associated parameter distributions measured by LISA. With our models, we find massive black hole binary detection rates with LISA of ∼0.5 − 1 yr−1 for massive black holes with masses greater than 105M⊙. This should be treated as a lower limit primarily because our massive black hole sample does not include masses below 105M⊙, which may significantly add to the observed rate. We suggest reasons why we predict lower detection rates compared to much of the literature. 
                        more » 
                        « less   
                    This content will become publicly available on September 30, 2026
                            
                            Improved Local Indicators of Spatial Association Analysis for Zero-Heavy Crack Cocaine Seizure Data
                        
                    
    
            Local Indicators of Spatial Association (LISA) analysis is a useful tool for analyzing and extracting meaningful insights from geographic data. It provides informative statistical analysis that highlights areas of high and low activity. However, LISA analysis methods may not be appropriate for zero-heavy data, as without the correct mathematical context, the meaning of the patterns identified by the analysis may be distorted. We demonstrate these issues through statistical analysis and provide the appropriate context for interpreting LISA results for zero-heavy data. We then propose an improved LISA analysis method for spatial data with a majority of zero values. This work constitutes a possible path to a more appropriate understanding of the underlying spatial relationships. Applying our proposed methodology to crack cocaine seizure data in the United States, we show how our improved methods identify different spatial patterns, which in our context could lead to different real-world law enforcement strategies. As LISA analysis is a popular statistical approach that supports policy analysis and design, and as zero-heavy data are common in these scenarios, we provide a framework that is tailored to zero-heavy contexts, improving interpretations and providing finer categorization of observed data, ultimately leading to better decisions in multiple fields where spatial data are foundational. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 2039862
- PAR ID:
- 10642005
- Publisher / Repository:
- INFORMS
- Date Published:
- Journal Name:
- INFORMS Journal on Data Science
- ISSN:
- 2694-4022
- Subject(s) / Keyword(s):
- exploratory spatial data analysis LISA analysis zero-heavy data local Moran’s I U.S. cocaine seizures
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Ossi, Federico; Hachem, Fatima; Robira, Benjamin; Ellis Soto, Diego; Rutz, Christian; Dodge, Somayeh; Cagnacci, Francesca; Damiani, Maria Luisa (Ed.)Data collected about routine human activity and mobility is used in diverse applications to improve our society. Robust models are needed to address the challenges of our increasingly interconnected world. Methods capable of portraying the dynamic properties of complex human systems, such as simulation modeling, must comply to rigorous data requirements. Modern data sources, like SafeGraph, provide aggregate data collected from location aware technologies. Opportunities and challenges arise to incorporate the new data into existing analysis and modeling methods. Our research employs a multiscale spatial similarity index to compare diverse origin-destination mobility datasets. Established distance ranges accommodate spatial variability in the model’s datasets. This paper explores how similarity scores change with different aggregations to address discrepancies in the source data’s temporal granularity. We suggest possible explanations for variations in the similarity scores and extract characteristics of human mobility for the study area. The multiscale spatial similarity index may be integrated into a vast array of analysis and modeling workflows, either during preliminary analysis or later evaluation phases as a method of data validation (e.g., agent-based models). We propose that the demonstrated tool has potential to enhance mobility modeling methods in the context of complex human systems.more » « less
- 
            Graduate level statistics education curricula often emphasize technical instruction in theory and methodology but can fail to provide adequate practical training in applications and collaboration skills. We argue that a statistical collaboration center (“stat lab”) structured in the style of the University of Colorado Boulder’s Laboratory for Interdisciplinary Statistical Analysis (LISA) is an effective mechanism for providing graduate students with necessary training in technical, non-technical, and job-related skills. We summarize the operating structure of LISA, and then provide evidence of its positive impact on students via analyses of a survey completed by 123 collaborators who worked in LISA between 2008–15 while it was housed at Virginia Tech. Students described their work in LISA as having had a positive impact on acquiring technical (94%) and non-technical (95%) statistics skills. Five-sixths (83%) of the students reported that these skills will or have helped them advance in their careers. We call for the integration of stat labs into statistics and data science programs as part of a comprehensive and modern statistics education, and for further research on students’ experience in these labs and the impact on student outcomes.more » « less
- 
            Abstract The performance of computational methods and software to identify differentially expressed features in single‐cell RNA‐sequencing (scRNA‐seq) has been shown to be influenced by several factors, including the choice of the normalization method used and the choice of the experimental platform (or library preparation protocol) to profile gene expression in individual cells. Currently, it is up to the practitioner to choose the most appropriate differential expression (DE) method out of over 100 DE tools available to date, each relying on their own assumptions to model scRNA‐seq expression features. To model the technological variability in cross‐platform scRNA‐seq data, here we propose to use Tweedie generalized linear models that can flexibly capture a large dynamic range of observed scRNA‐seq expression profiles across experimental platforms induced by platform‐ and gene‐specific statistical properties such as heavy tails, sparsity, and gene expression distributions. We also propose a zero‐inflated Tweedie model that allows zero probability mass to exceed a traditional Tweedie distribution to model zero‐inflated scRNA‐seq data with excessive zero counts. Using both synthetic and published plate‐ and droplet‐based scRNA‐seq datasets, we perform a systematic benchmark evaluation of more than 10 representative DE methods and demonstrate that our method (Tweedieverse) outperforms the state‐of‐the‐art DE approaches across experimental platforms in terms of statistical power and false discovery rate control. Our open‐source software (R/Bioconductor package) is available athttps://github.com/himelmallick/Tweedieverse.more » « less
- 
            We consider the task of heavy-tailed statistical estimation given streaming p-dimensional samples. This could also be viewed as stochastic optimization under heavy-tailed distributions, with an additional O(p) space complexity constraint. We design a clipped stochastic gradient descent algorithm and provide an improved analysis, under a more nuanced condition on the noise of the stochastic gradients, which we show is critical when analyzing stochastic optimization problems arising from general statistical estimation problems. Our results guarantee convergence not just in expectation but with exponential concentration, and moreover does so using O(1) batch size. We provide consequences of our results for mean estimation and linear regression. Finally, we provide empirical corroboration of our results and algorithms via synthetic experiments for mean estimation and linear regression.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
