skip to main content


Search for: All records

Creators/Authors contains: "Zhang, Kun"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. The noise transition matrix plays a central role in the problem of learning with noisy labels. Among many other reasons, a large number of existing solutions rely on access to it. Identifying and estimating the transition matrix without ground truth labels is a critical and challenging task. When label noise transition depends on each instance, the problem of identifying the instance-dependent noise transition matrix becomes substantially more challenging. Despite recent works proposing solutions for learning from instance-dependent noisy labels, the field lacks a unified understanding of when such a problem remains identifiable. The goal of this paper is to characterize the identifiability of the label noise transition matrix. Building on Kruskal's identifiability results, we are able to show the necessity of multiple noisy labels in identifying the noise transition matrix for the generic case at the instance level. We further instantiate the results to explain the successes of the state-of-the-art solutions and how additional assumptions alleviated the requirement of multiple noisy labels. Our result also reveals that disentangled features are helpful in the above identification task and we provide empirical evidence. 
    more » « less
    Free, publicly-accessible full text available July 24, 2024
  2. Li, Zhiming (Ed.)

    Acetylation of lysine residues is an important and common post-translational regulatory mechanism occurring on thousands of non-histone proteins. Lysine deacetylases (KDACs or HDACs) are a family of enzymes responsible for removing acetylation. To identify the biological mechanisms regulated by individual KDACs, we created HT1080 cell lines containing chromosomal point mutations, which endogenously express either KDAC6 or KDAC8 having single inactivated catalytic domain. Engineered HT1080 cells expressing inactive KDA6 or KDAC8 domains remained viable and exhibited enhanced acetylation on known substrate proteins. RNA-seq analysis revealed that many changes in gene expression were observed when KDACs were inactivated, and that these gene sets differed significantly from knockdown and knockout cell lines. Using GO ontology, we identified several critical biological processes associated specifically with catalytic activity and others attributable to non-catalytic interactions. Treatment of wild-type cells with KDAC-specific inhibitors Tubastatin A and PCI-34051 resulted in gene expression changes distinct from those of the engineered cell lines, validating this approach as a tool for evaluating in-cell inhibitor specificity and identifying off-target effects of KDAC inhibitors. Probing the functions of specific KDAC domains using these cell lines is not equivalent to doing so using previously existing methods and provides novel insight into the catalytic functions of individual KDACs by investigating the molecular and cellular changes upon genetic inactivation.

     
    more » « less
    Free, publicly-accessible full text available September 18, 2024
  3. Given an algorithmic predictor that is accurate on some source population consisting of strategic human decision subjects, will it remain accurate if the population respond to it? In our setting, an agent or a user corresponds to a sample (X,Y) drawn from a distribution  and will face a model h and its classification result h(X). Agents can modify X to adapt to h, which will incur a distribution shift on (X,Y). Our formulation is motivated by applications where the deployed machine learning models are subjected to human agents, and will ultimately face responsive and interactive data distributions. We formalize the discussions of the transferability of a model by studying how the performance of the model trained on the available source distribution (data) would translate to the performance on its induced domain. We provide both upper bounds for the performance gap due to the induced domain shift, as well as lower bounds for the trade-offs that a classifier has to suffer on either the source training distribution or the induced target distribution. We provide further instantiated analysis for two popular domain adaptation settings, including covariate shift and target shift. 
    more » « less
    Free, publicly-accessible full text available July 24, 2024
  4. Free, publicly-accessible full text available May 1, 2024
  5. Free, publicly-accessible full text available May 1, 2024
  6. The pursuit of long-term fairness involves the interplay between decision-making and the underlying data generating process. In this paper, through causal modeling with a directed acyclic graph (DAG) on the decision-distribution interplay, we investigate the possibility of achieving long-term fairness from a dynamic perspective. We propose Tier Balancing, a technically more challenging but more natural notion to achieve in the context of long-term, dynamic fairness analysis. Different from previous fairness notions that are defined purely on observed variables, our notion goes one step further, capturing behind-the-scenes situation changes on the unobserved latent causal factors that directly carry out the influence from the current decision to the future data distribution. Under the specified dynamics, we prove that in general one cannot achieve the long-term fairness goal only through one-step interventions. Furthermore, in the effort of approaching long-term fairness, we consider the mission of "getting closer to" the long-term fairness goal and present possibility and impossibility results accordingly. 
    more » « less
  7. Abstract The bow-and-arrow Mesoscale Convective System (MCS) has a unique structure with two convective lines resembling the shape of an archer’s bow and arrow. These MCSs and their arrow convection (located behind the MCS leading line) can produce hazardous winds and flooding extending over hundreds of kilometers, which are often poorly predicted in operational forecasts. This study examines the dynamics of a bow-and-arrow MCS observed over the Yangtze–Huai Plains of China, with a focus on the arrow convection provided. The analysis utilized backward trajectories and Lagrangian vertical momentum budgets to simulations employing the WRF‐ARW and CM1 models. Cells within the arrow in the WRF-ARW simulations of the MCS were elevated, initially forming as convectively unstable air within the low-level jet (LLJ), which gently ascended over the cold pool and converged with the MCS’s mesoscale convective vortex (MCV) at higher altitudes. The subsequent ascent in these cells was enhanced by dynamic pressure forcing due to the updraft being within a layer where the vertical shear changed with height due to the superposition of the LLJ and the MCV. These dynamic forcings initially played a larger role in the ascent than the parcel’s buoyancy. These findings were bolstered by idealized simulations employing the CM1 model. These results illustrate a challenge for accurately forecasting bow-and-arrow MCSs as the updraft magnitude depends on dynamical forcing associated with the interaction between vertical shear associated with the environment and due to convectively generated circulations. 
    more » « less
  8. Recently, many regression based conditional independence (CI) test methods have been proposed to solve the problem of causal discovery. These methods provide alternatives to test CI by first removing the information of the controlling set from the two target variables, and then testing the independence between the corresponding residuals Res1 and Res2. When the residuals are linearly uncorrelated, the independence test between them is nontrivial. With the ability to calculate inner product in high-dimensional space, kernel-based methods are usually used to achieve this goal, but still consume considerable time. In this paper, we investigate the independence between two linear combinations under linear non-Gaussian structural equation model. We show that the dependence between the two residuals can be captured by the difference between the similarity of (Res1, Res2) and that of (Res1, Res3) (Res3 is generated by random permutation) in high-dimensional space. With this result, we design a new method called SCIT for CI test, where permutation test is performed to control Type I error rate. The proposed method is simpler yet more efficient and effective than the existing ones. When applied to causal discovery, the proposed method outperforms the counterparts in terms of both speed and Type II error rate, especially in the case of small sample size, which is validated by our extensive experiments on various datasets. 
    more » « less