skip to main content


Search for: All records

Award ID contains: 2118953

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract

    The increasing prevalence of wearable devices enables low-cost, long-term collection of health relevant data such as heart rate, exercise, and sleep signals. Currently these data are used to monitor short term changes with limited interpretation of their relevance to health. These data provide an untapped resource to monitor daily and long-term activity patterns. Changes and trends identified from such data can provide insights and guidance to the management of many chronic conditions that change over time. In this study we conducted a machine learning based analysis of longitudinal heart rate data collected over multiple years from Fitbit devices. We built a multi-resolutional pipeline for time series analysis, using model-free clustering methods inspired by statistical conformal prediction framework. With this method, we were able to detect health relevant events, their interesting patterns (e.g., daily routines, seasonal differences, and anomalies), and correlations to acute and chronic changes in health conditions. We present the results, lessons, and insights learned, and how to address the challenge of lack of labels. The study confirms the value of long-term heart rate data for health monitoring and surveillance, as complementary to extensive yet intermittent examinations by health care providers.

     
    more » « less
  2. Free, publicly-accessible full text available July 21, 2025
  3. Active learning (AL) aims to improve model performance within a fixed labeling budget by choosing the most informative data points to label. Existing AL focuses on the single-domain setting, where all data come from the same domain (e.g., the same dataset). However, many real-world tasks often involve multiple domains. For example, in visual recognition, it is often desirable to train an image classifier that works across different environments (e.g., different backgrounds), where images from each environment constitute one domain. Such a multi-domain AL setting is challenging for prior methods because they (1) ignore the similarity among different domains when assigning labeling budget and (2) fail to handle distribution shift of data across different domains. In this paper, we propose the first general method, dubbed composite active learning (CAL), for multi-domain AL. Our approach explicitly considers the domain-level and instance-level information in the problem; CAL first assigns domain-level budgets according to domain-level importance, which is estimated by optimizing an upper error bound that we develop; with the domain-level budgets, CAL then leverages a certain instance-level query strategy to select samples to label from each domain. Our theoretical analysis shows that our method achieves a better error bound compared to current AL methods. Our empirical results demonstrate that our approach significantly outperforms the state-of-the-art AL methods on both synthetic and real-world multi-domain datasets. Code is available at https://github.com/Wang-ML-Lab/multi-domain-active-learning.

     
    more » « less
    Free, publicly-accessible full text available March 25, 2025
  4. Mulzer, Wolfgang ; Phillips, Jeff M (Ed.)
    Finding the diameter of a graph in general cannot be done in truly subquadratic assuming the Strong Exponential Time Hypothesis (SETH), even when the underlying graph is unweighted and sparse. When restricting to concrete classes of graphs and assuming SETH, planar graphs and minor-free graphs admit truly subquadratic algorithms, while geometric intersection graphs of unit balls, congruent equilateral triangles, and unit segments do not. Unit-disk graphs is one of the major open cases where the complexity of diameter computation remains unknown. More generally, it is conjectured that a truly subquadratic time algorithm exists for pseudo-disk graphs where each pair of objects has at most two intersections on the boundary. In this paper, we show a truly-subquadratic algorithm of running time O^~(n^{2-1/18}), for finding the diameter in a unit-disk graph, whose output differs from the optimal solution by at most 2. This is the first algorithm that provides an additive guarantee in distortion, independent of the size or the diameter of the graph. Our algorithm requires two important technical elements. First, we show that for the intersection graph of pseudo-disks, the graph VC-dimension - either of k-hop balls or the distance encoding vectors - is 4. This contrasts to the VC dimension of the pseudo-disks themselves as geometric ranges (which is known to be 3). Second, we introduce a clique-based r-clustering for geometric intersection graphs, which is an analog of the r-division construction for planar graphs. We also showcase the new techniques by establishing new results for distance oracles for unit-disk graphs with subquadratic storage and O(1) query time. The results naturally extend to unit L₁ or L_∞-disks and fat pseudo-disks of similar size. Last, if the pseudo-disks additionally have bounded ply, we have a truly subquadratic algorithm to find the exact diameter. 
    more » « less
  5. Bringmann, Karl ; Grohe, Martin ; Puppis, Gabriele ; Svensson, Ola (Ed.)
    The hereditary discrepancy of a set system is a quantitative measure of the pseudorandom properties of the system. Roughly speaking, hereditary discrepancy measures how well one can 2-color the elements of the system so that each set contains approximately the same number of elements of each color. Hereditary discrepancy has numerous applications in computational geometry, communication complexity and derandomization. More recently, the hereditary discrepancy of the set system of shortest paths has found applications in differential privacy [Chen et al. SODA 23]. The contribution of this paper is to improve the upper and lower bounds on the hereditary discrepancy of set systems of unique shortest paths in graphs. In particular, we show that any system of unique shortest paths in an undirected weighted graph has hereditary discrepancy O(n^{1/4}), and we construct lower bound examples demonstrating that this bound is tight up to polylog n factors. Our lower bounds hold even for planar graphs and bipartite graphs, and improve a previous lower bound of Ω(n^{1/6}) obtained by applying the trace bound of Chazelle and Lvov [SoCG'00] to a classical point-line system of Erdős. As applications, we improve the lower bound on the additive error for differentially-private all pairs shortest distances from Ω(n^{1/6}) [Chen et al. SODA 23] to Ω̃(n^{1/4}), and we improve the lower bound on additive error for the differentially-private all sets range queries problem to Ω̃(n^{1/4}), which is tight up to polylog n factors [Deng et al. WADS 23]. 
    more » « less
  6. We propose new differential privacy solutions for when external invariants and integer constraints are simultaneously enforced on the data product. These requirements arise in real world applications of private data curation, including the public release of the 2020 U.S. Decennial Census. They pose a great challenge to the production of provably private data products with adequate statistical usability. We propose integer subspace differential privacy to rigorously articulate the privacy guarantee when data products maintain both the invariants and integer characteristics, and demonstrate the composition and post-processing properties of our proposal. To address the challenge of sampling from a potentially highly restricted discrete space, we devise a pair of unbiased additive mechanisms, the generalized Laplace and the generalized Gaussian mechanisms, by solving the Diophantine equations as defined by the constraints. The proposed mechanisms have good accuracy, with errors exhibiting sub-exponential and sub-Gaussian tail probabilities respectively. To implement our proposal, we design an MCMC algorithm and supply empirical convergence assessment using estimated upper bounds on the total variation distance via L-lag coupling. We demonstrate the efficacy of our proposal with applications to a synthetic problem with intersecting invariants, a sensitive contingency table with known margins, and the 2010 Census county-level demonstration data with mandated fixed state population totals. 
    more » « less
  7. Many data applications have certain invariant constraints due to practical needs. Data curators who employ differential privacy need to respect such constraints on the sanitized data product as a primary utility requirement. Invariants challenge the formulation, implementation, and interpretation of privacy guarantees. We propose subspace differential privacy, to honestly characterize the dependence of the sanitized output on confidential aspects of the data. We discuss two design frameworks that convert well-known differentially private mechanisms, such as the Gaussian and the Laplace mechanisms, to subspace differentially private ones that respect the invariants specified by the curator. For linear queries, we discuss the design of near-optimal mechanisms that minimize the mean squared error. Subspace differentially private mechanisms rid the need for post-processing due to invariants, preserve transparency and statistical intelligibility of the output, and can be suitable for distributed implementation. We showcase the proposed mechanisms on the 2020 Census Disclosure Avoidance demonstration data, and a spatio-temporal dataset of mobile access point connections on a large university campus. 
    more » « less