skip to main content


Search for: All records

Creators/Authors contains: "Zhang, Kun"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. We reveal and address the frequently overlooked yet important issue of disguised procedural unfairness, namely, the potentially inadvertent alterations on the behavior of neutral (i.e., not problematic) aspects of data generating process, and/or the lack of procedural assurance of the greatest benefit of the least advantaged individuals. Inspired by John Rawls's advocacy for pure procedural justice, we view automated decision-making as a microcosm of social institutions, and consider how the data generating process itself can satisfy the requirements of procedural fairness. We propose a framework that decouples the objectionable data generating components from the neutral ones by utilizing reference points and the associated value instantiation rule. Our findings highlight the necessity of preventing disguised procedural unfairness, drawing attention not only to the objectionable data generating components that we aim to mitigate, but also more importantly, to the neutral components that we intend to keep unaffected. 
    more » « less
    Free, publicly-accessible full text available May 7, 2025
  2. Free, publicly-accessible full text available May 1, 2025
  3. Abstract

    The ion foreshock, filled with backstreaming foreshock ions, is very dynamic with many transient structures that disturb the bow shock and the magnetosphere‐ionosphere system. It has been shown that foreshock ions can be generated through either solar wind reflection at the bow shock or leakage from the magnetosheath. While solar wind reflection is widely believed to be the dominant generation process, our investigation using Time History of Events and Macroscale Interactions during Substorms mission observations reveals that the relative importance of magnetosheath leakage has been underestimated. We show from case studies that when the magnetosheath ions exhibit field‐aligned anisotropy, a large fraction of them attains sufficient field‐aligned speed to escape upstream, resulting in very high foreshock ion density. The observed foreshock ion density, velocity, phase space density, and distribution function shape are consistent with such an escape or leakage process. Our results suggest that magnetosheath leakage could be a significant contributor to the formation of the ion foreshock. Further characterization of the magnetosheath leakage process is a critical step toward building predictive models of the ion foreshock, a necessary step to better forecast foreshock‐driven space weather effects.

     
    more » « less
    Free, publicly-accessible full text available February 1, 2025
  4. Sparse online learning has received extensive attention during the past few years. Most of existing algorithms that utilize ℓ1-norm regularization or ℓ1-ball projection assume that the feature space is fixed or changes by following explicit constraints. However, this assumption does not always hold in many real applications. Motivated by this observation, we propose a new online learning algorithm tailored for data streams described by open feature spaces, where new features can be occurred, and old features may be vanished over various time spans. Our algorithm named RSOL provides a strategy to adapt quickly to such feature dynamics by encouraging sparse model representation with an ℓ1- and ℓ2-mixed regularizer. We leverage the proximal operator of the ℓ1,2-mixed norm and show that our RSOL algorithm enjoys a closed-form solution at each iteration. A sub-linear regret bound of our proposed algorithm is guaranteed with a solid theoretical analysis. Empirical results benchmarked on nine streaming datasets validate the effectiveness of the proposed RSOL method over three state-of-the-art algorithms. Keywords: online learning, sparse learning, streaming feature selection, open feature spaces, ℓ1,2 mixed norm 
    more » « less
  5. Identifying latent variables and causal structures from observational data is essential to many real-world applications involving biological data, medical data, and unstructured data such as images and languages. However, this task can be highly challenging, especially when observed variables are generated by causally related latent variables and the relationships are nonlinear. In this work, we investigate the identification problem for nonlinear latent hierarchical causal models in which observed variables are generated by a set of causally related latent variables, and some latent variables may not have observed children. We show that the identifiability of causal structures and latent variables (up to invertible transformations) can be achieved under mild assumptions: on causal structures, we allow for multiple paths between any pair of variables in the graph, which relaxes latent tree assumptions in prior work; on structural functions, we permit general nonlinearity and multi-dimensional continuous variables, alleviating existing work's parametric assumptions. Specifically, we first develop an identification criterion in the form of novel identifiability guarantees for an elementary latent variable model. Leveraging this criterion, we show that both causal structures and latent variables of the hierarchical model can be identified asymptotically by explicitly constructing an estimation procedure. To the best of our knowledge, our work is the first to establish identifiability guarantees for both causal structures and latent variables in nonlinear latent hierarchical models. 
    more » « less
  6. The noise transition matrix plays a central role in the problem of learning with noisy labels. Among many other reasons, a large number of existing solutions rely on access to it. Identifying and estimating the transition matrix without ground truth labels is a critical and challenging task. When label noise transition depends on each instance, the problem of identifying the instance-dependent noise transition matrix becomes substantially more challenging. Despite recent works proposing solutions for learning from instance-dependent noisy labels, the field lacks a unified understanding of when such a problem remains identifiable. The goal of this paper is to characterize the identifiability of the label noise transition matrix. Building on Kruskal's identifiability results, we are able to show the necessity of multiple noisy labels in identifying the noise transition matrix for the generic case at the instance level. We further instantiate the results to explain the successes of the state-of-the-art solutions and how additional assumptions alleviated the requirement of multiple noisy labels. Our result also reveals that disentangled features are helpful in the above identification task and we provide empirical evidence. 
    more » « less
  7. Given an algorithmic predictor that is accurate on some source population consisting of strategic human decision subjects, will it remain accurate if the population respond to it? In our setting, an agent or a user corresponds to a sample (X,Y) drawn from a distribution  and will face a model h and its classification result h(X). Agents can modify X to adapt to h, which will incur a distribution shift on (X,Y). Our formulation is motivated by applications where the deployed machine learning models are subjected to human agents, and will ultimately face responsive and interactive data distributions. We formalize the discussions of the transferability of a model by studying how the performance of the model trained on the available source distribution (data) would translate to the performance on its induced domain. We provide both upper bounds for the performance gap due to the induced domain shift, as well as lower bounds for the trade-offs that a classifier has to suffer on either the source training distribution or the induced target distribution. We provide further instantiated analysis for two popular domain adaptation settings, including covariate shift and target shift. 
    more » « less
  8. Li, Zhiming (Ed.)

    Acetylation of lysine residues is an important and common post-translational regulatory mechanism occurring on thousands of non-histone proteins. Lysine deacetylases (KDACs or HDACs) are a family of enzymes responsible for removing acetylation. To identify the biological mechanisms regulated by individual KDACs, we created HT1080 cell lines containing chromosomal point mutations, which endogenously express either KDAC6 or KDAC8 having single inactivated catalytic domain. Engineered HT1080 cells expressing inactive KDA6 or KDAC8 domains remained viable and exhibited enhanced acetylation on known substrate proteins. RNA-seq analysis revealed that many changes in gene expression were observed when KDACs were inactivated, and that these gene sets differed significantly from knockdown and knockout cell lines. Using GO ontology, we identified several critical biological processes associated specifically with catalytic activity and others attributable to non-catalytic interactions. Treatment of wild-type cells with KDAC-specific inhibitors Tubastatin A and PCI-34051 resulted in gene expression changes distinct from those of the engineered cell lines, validating this approach as a tool for evaluating in-cell inhibitor specificity and identifying off-target effects of KDAC inhibitors. Probing the functions of specific KDAC domains using these cell lines is not equivalent to doing so using previously existing methods and provides novel insight into the catalytic functions of individual KDACs by investigating the molecular and cellular changes upon genetic inactivation.

     
    more » « less