skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 10:00 PM ET on Friday, February 6 until 10:00 AM ET on Saturday, February 7 due to maintenance. We apologize for the inconvenience.


Title: Missing not at random and the nonparametric estimation of the spectral density
The aim of the article is twofold: (i) present a pivotal setting where using an extra experiment for restoring information lost due to missing not at random (MNAR) is practically feasible; (ii) attract attention to a wide spectrum of new research topics created by the proposed methodology of exploring the missing mechanism. It is well known that if the likelihood of missing an observation depends on its value, then the missing is MNAR, no consistent estimation is possible, and the only way to recover destroyed information is to study the likelihood of missing via an extra experiment. One of the main practical issues with an extra‐sample approach is as follows. Letnandmbe the numbers of observations in a MNAR time series and in an extra sample exploring the likelihood of missing respectively. An oracle, that knows the likelihood of missing, can estimate the spectral density of an ARMA‐type spectral density with the MISE proportional to , while a differentiable likelihood may be estimated only with the MISE proportional tom−2/3. On first glance, these familiar facts yield that the proposed approach is impractical becausemmust be in order larger thannto match the oracle. Surprisingly, the article presents the theory and a numerical study indicating thatmmay be in order smaller thannand still the statistician can match performance of the oracle. The proposed methodology is used for the analysis of MNAR time series of systolic blood pressure of a person with immunoglobulin D multiple myeloma. A number of possible extensions and future research topics are outlined.  more » « less
Award ID(s):
1915845
PAR ID:
10574012
Author(s) / Creator(s):
Publisher / Repository:
Time Series Analysis
Date Published:
Journal Name:
Journal of Time Series Analysis
Volume:
41
Issue:
5
ISSN:
0143-9782
Page Range / eLocation ID:
652 to 675
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Missing data are ubiquitous in many domain such as healthcare. When these data entries are not missing completely at random, the (conditional) independence relations in the observed data may be different from those in the complete data generated by the underlying causal process.Consequently, simply applying existing causal discovery methods to the observed data may lead to wrong conclusions. In this paper, we aim at developing a causal discovery method to recover the underlying causal structure from observed data that are missing under different mechanisms, including missing completely at random (MCAR),missing at random (MAR), and missing not at random (MNAR). With missingness mechanisms represented by missingness graphs (m-graphs),we analyze conditions under which additional correction is needed to derive conditional independence/dependence relations in the complete data. Based on our analysis, we propose Miss-ing Value PC (MVPC), which extends the PC algorithm to incorporate additional corrections.Our proposed MVPC is shown in theory to give asymptotically correct results even on data that are MAR or MNAR. Experimental results on both synthetic data and real healthcare applications illustrate that the proposed algorithm is able to find correct causal relations even in the general case of MNAR. 
    more » « less
  2. Aims.We investigate the previous microlensing data collected by the KMTNet survey in search of anomalous events for which no precise interpretations of the anomalies had been suggested. From this investigation, we find that the anomaly in the lensing light curve of the event KMT-2021-BLG-1547 is approximately described by a binary-lens (2L1S) model with a lens possessing a giant planet, but the model leaves unexplained residuals. Methods.We investigated the origin of the residuals by testing more sophisticated models that include either an extra lens component (3L1S model) or an extra source star (2L2S model) on top of the 2L1S configuration of the lens system. From these analyses, we find that the residuals from the 2L1S model originate from the existence of a faint companion to the source. The 2L2S solution substantially reduces the residuals and improves the model fit by Δχ2= 67.1 with respect to the 2L1S solution. The 3L1S solution also improves the fit, but its fit is worse than that of the 2L2S solution by Δχ2= 24.7. Results.According to the 2L2S solution, the lens of the event is a planetary system with planet and host masses (Mp/MJ,Mh/M) = (1.47−0.77+0.64, 0.72−0.38+0.32) lying at a distanceDL= 5.07−1.50+0.98kpc, and the source is a binary composed of a subgiant primary of a lateGor an earlyKspectral type and a main-sequence companion of aKspectral type. The event demonstrates the need for sophisticated modeling of unexplained anomalies if one wants to construct a complete microlensing planet sample. 
    more » « less
  3. Traditional methods for handling incomplete data, including Multiple Imputation and Maximum Likelihood, require that the data be Missing At Random (MAR). In most cases, however, missingness in a variable depends on the underlying value of that variable. In this work, we devise model-based methods to consistently estimate mean, variance and covariance given data that are Missing Not At Random (MNAR). While previous work on MNAR data require variables to be discrete, we extend the analysis to continuous variables drawn from Gaussian distributions. We demonstrate the merits of our techniques by comparing it empirically to state of the art software packages. 
    more » « less
  4. A<sc>bstract</sc> A search for long-lived heavy neutrinos (N) in the decays of B mesons produced in proton-proton collisions at$$ \sqrt{s} $$ s = 13 TeV is presented. The data sample corresponds to an integrated luminosity of 41.6 fb−1collected in 2018 by the CMS experiment at the CERN LHC, using a dedicated data stream that enhances the number of recorded events containing B mesons. The search probes heavy neutrinos with masses in the range 1 <mN< 3 GeV and decay lengths in the range 10−2<cτN< 104mm, where τNis the N proper mean lifetime. Signal events are defined by the signature B →ℓBNX; N →ℓ±π, where the leptonsℓBandℓcan be either a muon or an electron, provided that at least one of them is a muon. The hadronic recoil system, X, is treated inclusively and is not reconstructed. No significant excess of events over the standard model background is observed in any of theℓ±πinvariant mass distributions. Limits at 95% confidence level on the sum of the squares of the mixing amplitudes between heavy and light neutrinos, |VN|2, and oncτNare obtained in different mixing scenarios for both Majorana and Dirac-like N particles. The most stringent upper limit|VN|2< 2.0×10−5is obtained atmN= 1.95 GeV for the Majorana case where N mixes exclusively with muon neutrinos. The limits on|VN|2for masses 1 <mN< 1.7 GeV are the most stringent from a collider experiment to date. 
    more » « less
  5. The stellar initial mass function (IMF) is critical to our understanding of star formation and the effects of young stars on their environment. On large scales, it enables us to use tracers such as UV or Hα emission to estimate the star formation rate of a system and interpret unresolved star clusters across the Universe. So far, there is little firm evidence of large-scale variations of the IMF, which is thus generally considered “universal”. Stars form from cores, and it is now possible to estimate core masses and compare the core mass function (CMF) with the IMF, which it presumably produces. The goal of the ALMA-IMF large programme is to measure the core mass function at high linear resolution (2700 au) in 15 typical Milky Way protoclusters spanning a mass range of 2.5 × 103to 32.7 × 103M. In this work, we used two different core extraction algorithms to extract ≈680 gravitationally bound cores from these 15 protoclusters. We adopted a per core temperature using the temperature estimate from the point-process mapping Bayesian method (PPMAP). A power-law fit to the CMF of the sub-sample of cores above the 1.64Mcompleteness limit (330 cores) through the maximum likelihood estimate technique yields a slope of 1.97 ± 0.06, which is significantly flatter than the 2.35 Salpeter slope. Assuming a self-similar mapping between the CMF and the IMF, this result implies that these 15 high-mass protoclusters will generate atypical IMFs. This sample currently is the largest sample that was produced and analysed self-consistently, derived at matched physical resolution, with per core temperature estimates, and cores as massive as 150M. We provide both the raw source extraction catalogues and the catalogues listing the source size, temperature, mass, spectral indices, and so on in the 15 protoclusters. 
    more » « less