skip to main content


Title: Nonparametric Density Estimation under Distribution Drift
We study nonparametric density estimation in non-stationary drift settings. Given a sequence of independent samples taken from a distribution that gradually changes in time, the goal is to compute the best estimate for the current distribution. We prove tight minimax risk bounds for both discrete and continuous smooth densities, where the minimum is over all possible estimates and the maximum is over all possible distributions that satisfy the drift constraints. Our technique handles a broad class of drift models and generalizes previous results on agnostic learning under drift.  more » « less
Award ID(s):
1813444
PAR ID:
10451726
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Proceedings of Machine Learning Research
Volume:
202
ISSN:
2640-3498
Page Range / eLocation ID:
24251-24270
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    A stable-frequency transmitter with relative radial acceleration to a receiver will show a change in received frequency over time, known as a “drift rate.” For a transmission from an exoplanet, we must account for multiple components of drift rate: the exoplanet’s orbit and rotation, the Earth’s orbit and rotation, and other contributions. Understanding the drift rate distribution produced by exoplanets relative to Earth, can (a) help us constrain the range of drift rates to check in a Search for Extraterrestrial Intelligence project to detect radio technosignatures, and (b) help us decide validity of signals-of-interest, as we can compare drifting signals with expected drift rates from the target star. In this paper, we modeled the drift rate distribution for ∼5300 confirmed exoplanets, using parameters from the NASA Exoplanet Archive (NEA). We find that confirmed exoplanets have drift rates such that 99% of them fall within the ±53 nHz range. This implies a distribution-informed maximum drift rate ∼4 times lower than previous work. To mitigate the observational biases inherent in the NEA, we also simulated an exoplanet population built to reduce these biases. The results suggest that, for a Kepler-like target star without known exoplanets, ±0.44 nHz would be sufficient to account for 99% of signals. This reduction in recommended maximum drift rate is partially due to inclination effects and bias toward short orbital periods in the NEA. These narrowed drift rate maxima will increase the efficiency of searches and save significant computational effort in future radio technosignature searches.

     
    more » « less
  2. null (Ed.)
    Abstract We consider the problem of distribution-free predictive inference, with the goal of producing predictive coverage guarantees that hold conditionally rather than marginally. Existing methods such as conformal prediction offer marginal coverage guarantees, where predictive coverage holds on average over all possible test points, but this is not sufficient for many practical applications where we would like to know that our predictions are valid for a given individual, not merely on average over a population. On the other hand, exact conditional inference guarantees are known to be impossible without imposing assumptions on the underlying distribution. In this work, we aim to explore the space in between these two and examine what types of relaxations of the conditional coverage property would alleviate some of the practical concerns with marginal coverage guarantees while still being possible to achieve in a distribution-free setting. 
    more » « less
  3. Code snippets are prevalent, but are hard to reuse because they often lack an accompanying environment configuration. Most are not actively maintained, allowing for drift between the most recent possible configuration and the code snippet as the snippet becomes out-of-date over time. Recent work has identified the problem of validating and detecting out-of-date code snippets as the most important consideration for code reuse. However, determining if a snippet is correct, but simply out-of-date, is a non-trivial task. In the best case, breaking changes are well documented, allowing developers to manually determine when a code snippet contains an out-of-date API usage. In the worst case, determining if and when a breaking change was made requires an exhaustive search through previous dependency versions. We present V2, a strategy for determining if a code snippet is out-of-date by detecting discrete instances of configuration drift, where the snippet uses an API which has since undergone a breaking change. Each instance of configuration drift is classified by a failure encountered during validation and a configuration patch, consisting of dependency version changes, which fixes the underlying fault. V2 uses feedback-directed search to explore the possible configuration space for a code snippet, reducing the number of potential environment configurations that need to be validated. When run on a corpus of public Python snippets from prior research, V2 identifies 248 instances of configuration drift. 
    more » « less
  4. Modeling distributions of covariates, or density estimation, is a core challenge in unsupervised learning. However, the majority of work only considers the joint distribution, which has limited utility in practical situations. A more general and useful problem is arbitrary conditional density estimation, which aims to model any possible conditional distribution over a set of covariates, reflecting the more realistic setting of inference based on prior knowledge. We propose a novel method, Arbitrary Conditioning with Energy (ACE), that can simultaneously estimate the distribution p(x_u | x_o) for all possible subsets of unobserved features x_u and observed features x_o. ACE is designed to avoid unnecessary bias and complexity — we specify densities with a highly expressive energy function and reduce the problem to only learning one-dimensional conditionals (from which more complex distributions can be recovered during inference). This results in an approach that is both simpler and higher-performing than prior methods. We show that ACE achieves state-of-the-art for arbitrary conditional likelihood estimation and data imputation on standard benchmarks. 
    more » « less
  5. Abstract. Free-drift estimates of sea ice motion are necessary to produce a seamless observational record combining buoy and satellite-derived sea ice motionvectors. We develop a new parameterization for the free drift of sea ice based on wind forcing, wind turning angle, sea ice state variables(thickness and concentration), and estimates of the ocean currents. Given the fact that the spatial distribution of the wind–ice–ocean transfercoefficient has a similar structure to that of the spatial distribution of sea ice thickness, we take the standard free-drift equation and introducea wind–ice–ocean transfer coefficient that scales linearly with ice thickness. Results show a mean bias error of −0.5 cm s−1(low-speed bias) and a root-mean-square error of 5.1 cm s−1, considering daily buoy drift data as truth. This represents a 35 %reduction of the error on drift speed compared to the free-drift estimates used in the Polar Pathfinder dataset (Tschudi et al., 2019b). Thethickness-dependent transfer coefficient provides an improved seasonality and long-term trend of the sea ice drift speed, with a minimum (maximum)drift speed in May (October), compared to July (January) for the constant transfer coefficient parameterizations which simply follow the peak inmean surface wind stresses. Over the 1979–2019 period, the trend in sea ice drift in this new model is +0.45 cm s−1 per decadecompared with +0.39 cm s−1 per decade from the buoy observations, whereas there is essentially no trend in a free-driftparameterization with a constant transfer coefficient (−0.09 cm s−1 per decade) or the Polar Pathfinder free-drift input data(−0.01 cm s−1 per decade). The optimal wind turning angle obtained from a least-squares fitting is 25∘, resulting in a meanerror and a root-mean-square error of +3 and 42∘ on the direction of the drift, respectively. The ocean current estimates obtained from theminimization procedure resolve key large-scale features such as the Beaufort Gyre and Transpolar Drift Stream and are in good agreement with oceanstate estimates from the ECCO, GLORYS, and PIOMAS ice–ocean reanalyses, as well as geostrophic currents from dynamical ocean topography, with aroot-mean-square difference of 2.4, 2.9, 2.6, and 3.8 cm s−1, respectively. Finally, a repeat of the analysis on two sub-sections of thetime series (pre- and post-2000) clearly shows the acceleration of the Beaufort Gyre (particularly along the Alaskan coastline) and an expansion ofthe gyre in the post-2000s, concurrent with a thinning of the sea ice cover and the observed acceleration of the ice drift speed and oceancurrents. This new dataset is publicly available for complementing merged observation-based sea ice drift datasets that include satellite and buoydrift records. 
    more » « less