skip to main content

Search for: All records

Creators/Authors contains: "Liu, Fang"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract Background

    A considerable amount of various types of data have been collected during the COVID-19 pandemic, the analysis and understanding of which have been indispensable for curbing the spread of the disease. As the pandemic moves to an endemic state, the data collected during the pandemic will continue to be rich sources for further studying and understanding the impacts of the pandemic on various aspects of our society. On the other hand, naïve release and sharing of the information can be associated with serious privacy concerns.


    We use three common but distinct data types collected during the pandemic (case surveillance tabular data, case location data, and contact tracing networks) to illustrate the publication and sharing of granular information and individual-level pandemic data in a privacy-preserving manner. We leverage and build upon the concept of differential privacy to generate and release privacy-preserving data for each data type. We investigate the inferential utility of privacy-preserving information through simulation studies at different levels of privacy guarantees and demonstrate the approaches in real-life data. All the approaches employed in the study are straightforward to apply.


    The empirical studies in all three data cases suggest that privacy-preserving results based on the differentially privately sanitized datamore »can be similar to the original results at a reasonably small privacy loss ($$\epsilon \approx 1$$ϵ1). Statistical inferences based on sanitized data using the multiple synthesis technique also appear valid, with nominal coverage of 95% confidence intervals when there is no noticeable bias in point estimation. When$$\epsilon <1$$ϵ<1 and the sample size is not large enough, some privacy-preserving results are subject to bias, partially due to the bounding applied to sanitized data as a post-processing step to satisfy practical data constraints.


    Our study generates statistical evidence on the practical feasibility of sharing pandemic data with privacy guarantees and on how to balance the statistical utility of released information during this process.

    « less
  2. Free, publicly-accessible full text available January 20, 2024
  3. Free, publicly-accessible full text available February 15, 2024
  4. Approximating probability distributions can be a challenging task, particularly when they are supported over regions of high geometrical complexity or exhibit multiple modes. Annealing can be used to facilitate this task which is often combined with constant a priori selected increments in inverse temperature. However, using constant increments limits the computational efficiency due to the inability to adapt to situations where smooth changes in the annealed density could be handled equally well with larger increments. We introduce AdaAnn, an adaptive annealing scheduler that automatically adjusts the temperature increments based on the expected change in the Kullback-Leibler divergence between two distributions with a sufficiently close annealing temperature. AdaAnn is easy to implement and can be integrated into existing sampling approaches such as normalizing flows for variational inference and Markov chain Monte Carlo. We demonstrate the computational efficiency of the AdaAnn scheduler for variational inference with normalizing flows on a number of examples, including posterior estimation of parameters for dynamical systems and probability density approximation in multimodal and high-dimensional settings.
    Free, publicly-accessible full text available January 1, 2024
  5. Free, publicly-accessible full text available January 1, 2024
  6. Fast inference of numerical model parameters from data is an important prerequisite to generate predictive models for a wide range of applications. Use of sampling-based approaches such as Markov chain Monte Carlo may become intractable when each likelihood evaluation is computationally expensive. New approaches combining variational inference with normalizing flow are characterized by a computational cost that grows only linearly with the dimensionality of the latent variable space, and rely on gradient-based optimization instead of sampling, providing a more efficient approach for Bayesian inference about the model parameters. Moreover, the cost of frequently evaluating an expensive likelihood can be mitigated by replacing the true model with an offline trained surrogate model, such as neural networks. However, this approach might generate significant bias when the surrogate is insufficiently accurate around the posterior modes. To reduce the computational cost without sacrificing inferential accuracy, we propose Normalizing Flow with Adaptive Surrogate (NoFAS), an optimization strategy that alternatively updates the normalizing flow parameters and surrogate model parameters. We also propose an efficient sample weighting scheme for surrogate model training that preserves global accuracy while effectively capturing high posterior density regions. We demonstrate the inferential and computational superiority of NoFAS against various benchmarks, including casesmore »where the underlying model lacks identifiability. The source code and numerical experiments used for this study are available at« less
    Free, publicly-accessible full text available October 15, 2023
  7. Interactions of quantum materials with strong laser fields can induce exotic non-equilibrium electronic states. Monolayer transition metal dichalcogenides, a new class of direct-gap semiconductors with prominent quantum confinement, offer exceptional opportunities for the Floquet engineering of excitons, which are quasiparticle electron–hole correlated states8. Strong-field driving has the potential to achieve enhanced control of the electronic band structure and thus the possibility of opening a new realm of exciton light–matter interactions. However, a full characterization of strong-field driven exciton dynamics has been difficult. Here we use mid-infrared laser pulses below the optical bandgap to excite monolayer tungsten disulfide and demonstrate strong-field light dressing of excitons in excess of a hundred millielectronvolts. Our high-sensitivity transient absorption spectroscopy further reveals the formation of a virtual absorption feature below the 1s-exciton resonance, which we assign to a light-dressed sideband from the dark 2p-exciton state. Quantum-mechanical simulations substantiate the experimental results and enable us to retrieve real-space movies of the exciton dynamics. This study advances our understanding of the exciton dynamics in the strong-field regime, showing the possibility of harnessing ultrafast, strong-field phenomena in device applications of two-dimensional materials.
    Free, publicly-accessible full text available January 5, 2024
  8. von Davier, Matthias (Ed.)
    Computerized assessment provides rich multidimensional data including trial-by-trial accuracy and response time (RT) measures. A key question in modeling this type of data is how to incorporate RT data, for example, in aid of ability estimation in item response theory (IRT) models. To address this, we propose a joint model consisting of a two-parameter IRT model for the dichotomous item response data, a log-normal model for the continuous RT data, and a normal model for corresponding paper-and-pencil scores. Then, we reformulate and reparameterize the model to capture the relationship between the model parameters, to facilitate the prior specification, and to make the Bayesian computation more efficient. Further, we propose several new model assessment criteria based on the decomposition of deviance information criterion (DIC) the logarithm of the pseudo-marginal likelihood (LPML). The proposed criteria can quantify the improvement in the fit of one part of the multidimensional data given the other parts. Finally, we have conducted several simulation studies to examine the empirical performance of the proposed model assessment criteria and have illustrated the application of these criteria using a real dataset from a computerized educational assessment program.
  9. Summary

    A treatment regime is a sequence of decision rules, one per decision point, that maps accumulated patient information to a recommended intervention. An optimal treatment regime maximises expected cumulative utility if applied to select interventions in a population of interest. As a treatment regime seeks to improve the quality of healthcare by individualising treatment, it can be viewed as an approach to formalising precision medicine. Increased interest and investment in precision medicine has led to a surge of methodological research focusing on estimation and evaluation of optimal treatment regimes from observational and/or randomised studies. These methods are becoming commonplace in biomedical research, although guidance about how to choose among existing methods in practice has been somewhat limited. The purpose of this review is to describe some of the most commonly used methods for estimation of an optimal treatment regime, and to compare these estimators in a series of simulation experiments and applications to real data. The results of these simulations along with the theoretical/methodological properties of these estimators are used to form recommendations for applied researchers.

    Free, publicly-accessible full text available February 22, 2024