skip to main content


Search for: All records

Creators/Authors contains: "Kang, Jian"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract

    Untargeted metabolomics based on liquid chromatography-mass spectrometry technology is quickly gaining widespread application, given its ability to depict the global metabolic pattern in biological samples. However, the data are noisy and plagued by the lack of clear identity of data features measured from samples. Multiple potential matchings exist between data features and known metabolites, while the truth can only be one-to-one matches. Some existing methods attempt to reduce the matching uncertainty, but are far from being able to remove the uncertainty for most features. The existence of the uncertainty causes major difficulty in downstream functional analysis. To address these issues, we develop a novel approach for Bayesian Analysis of Untargeted Metabolomics data (BAUM) to integrate previously separate tasks into a single framework, including matching uncertainty inference, metabolite selection and functional analysis. By incorporating the knowledge graph between variables and using relatively simple assumptions, BAUM can analyze datasets with small sample sizes. By allowing different confidence levels of feature-metabolite matching, the method is applicable to datasets in which feature identities are partially known. Simulation studies demonstrate that, compared with other existing methods, BAUM achieves better accuracy in selecting important metabolites that tend to be functionally consistent and assigning confidence scores to feature-metabolite matches. We analyze a COVID-19 metabolomics dataset and a mouse brain metabolomics dataset using BAUM. Even with a very small sample size of 16 mice per group, BAUM is robust and stable. It finds pathways that conform to existing knowledge, as well as novel pathways that are biologically plausible.

     
    more » « less
  2. ASTRACT

    Brain-effective connectivity analysis quantifies directed influence of one neural element or region over another, and it is of great scientific interest to understand how effective connectivity pattern is affected by variations of subject conditions. Vector autoregression (VAR) is a useful tool for this type of problems. However, there is a paucity of solutions when there is measurement error, when there are multiple subjects, and when the focus is the inference of the transition matrix. In this article, we study the problem of transition matrix inference under the high-dimensional VAR model with measurement error and multiple subjects. We propose a simultaneous testing procedure, with three key components: a modified expectation-maximization (EM) algorithm, a test statistic based on the tensor regression of a bias-corrected estimator of the lagged auto-covariance given the covariates, and a properly thresholded simultaneous test. We establish the uniform consistency for the estimators of our modified EM, and show that the subsequent test achieves both a consistent false discovery control, and its power approaches one asymptotically. We demonstrate the efficacy of our method through both simulations and a brain connectivity study of task-evoked functional magnetic resonance imaging.

     
    more » « less
  3. Free, publicly-accessible full text available January 1, 2025
  4. Free, publicly-accessible full text available July 24, 2024
  5. Abstract

    Varying coefficient models have been used to explore dynamic effects in many scientific areas, such as in medicine, finance, and epidemiology. As most existing models ignore the existence of zero regions, we propose a new soft-thresholded varying coefficient model, where the coefficient functions are piecewise smooth with zero regions. Our new modeling approach enables us to perform variable selection, detect the zero regions of selected variables, obtain point estimates of the varying coefficients with zero regions, and construct a new type of sparse confidence intervals that accommodate zero regions. We prove the asymptotic properties of the estimator, based on which we draw statistical inference. Our simulation study reveals that the proposed sparse confidence intervals achieve the desired coverage probability. We apply the proposed method to analyze a large-scale preoperative opioid study.

     
    more » « less
  6. Algorithmic fairness is becoming increasingly important in data mining and machine learning. Among others, a foundational notation is group fairness. The vast majority of the existing works on group fairness, with a few exceptions, primarily focus on debiasing with respect to a single sensitive attribute, despite the fact that the co-existence of multiple sensitive attributes (e.g., gender, race, marital status, etc.) in the real-world is commonplace. As such, methods that can ensure a fair learning outcome with respect to all sensitive attributes of concern simultaneously need to be developed. In this paper, we study the problem of information-theoretic intersectional fairness (InfoFair), where statistical parity, a representative group fairness measure, is guaranteed among demographic groups formed by multiple sensitive attributes of interest. We formulate it as a mutual information minimization problem and propose a generic end-to-end algorithmic framework to solve it. The key idea is to leverage a variational representation of mutual information, which considers the variational distribution between learning outcomes and sensitive attributes, as well as the density ratio between the variational and the original distributions. Our proposed framework is generalizable to many different settings, including other statistical notions of fairness, and could handle any type of learning task equipped with a gradientbased optimizer. Empirical evaluations in the fair classification task on three real-world datasets demonstrate that our proposed framework can effectively debias the classification results with minimal impact to the classification accuracy. 
    more » « less
  7. In a connected world, fair graph learning is becoming increasingly important because of the growing concerns about bias. Yet, the vast majority of existing works assume that the input graph comes from a single view while ignoring the multi-view essence of graphs. Generally speaking, the bias in graph mining is often rooted in the input graph and is further introduced or even amplified by the graph mining model. It thus poses critical research questions regarding the intrinsic relationships of fairness on different views and the possibility of mitigating bias on multiple views simultaneously. To answer these questions, in this paper, we explore individual fairness in multi-view graph mining. We first demonstrate the necessity of fair multi-view graph learning. Building upon the optimization perspective of fair single-view graph mining, we then formulate our problem as a linear weighted optimization problem. In order to figure out the weight of each view, we resort to the minimax Pareto fairness, which is closely related to the Rawlsian difference principle, and propose an effective solver named iFiG that minimizes the utility loss while promoting individual fairness for each view with two different instantiations. The extensive experiments that we conduct in the application of multi-view spectral clustering and INFORM post-processing demonstrate the efficacy of our proposed method in individual bias mitigation. 
    more » « less