skip to main content

Search for: All records

Creators/Authors contains: "Wang, Yu"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available November 23, 2023
  2. Abstract

    While machine learning has emerged in recent years as a useful tool for the rapid prediction of materials properties, generating sufficient data to reliably train models without overfitting is often impractical. Towards overcoming this limitation, we present a general framework for leveraging complementary information across different models and datasets for accurate prediction of data-scarce materials properties. Our approach, based on a machine learning paradigm called mixture of experts, outperforms pairwise transfer learning on 14 of 19 materials property regression tasks, performing comparably on four of the remaining five. The approach is interpretable, model-agnostic, and scalable to combining an arbitrary number of pre-trained models and datasets to any downstream property prediction task. We anticipate the performance of our framework will further improve as better model architectures, new pre-training tasks, and larger materials datasets are developed by the community.

  3. Fast inference of numerical model parameters from data is an important prerequisite to generate predictive models for a wide range of applications. Use of sampling-based approaches such as Markov chain Monte Carlo may become intractable when each likelihood evaluation is computationally expensive. New approaches combining variational inference with normalizing flow are characterized by a computational cost that grows only linearly with the dimensionality of the latent variable space, and rely on gradient-based optimization instead of sampling, providing a more efficient approach for Bayesian inference about the model parameters. Moreover, the cost of frequently evaluating an expensive likelihood can be mitigated by replacing the true model with an offline trained surrogate model, such as neural networks. However, this approach might generate significant bias when the surrogate is insufficiently accurate around the posterior modes. To reduce the computational cost without sacrificing inferential accuracy, we propose Normalizing Flow with Adaptive Surrogate (NoFAS), an optimization strategy that alternatively updates the normalizing flow parameters and surrogate model parameters. We also propose an efficient sample weighting scheme for surrogate model training that preserves global accuracy while effectively capturing high posterior density regions. We demonstrate the inferential and computational superiority of NoFAS against various benchmarks, including casesmore »where the underlying model lacks identifiability. The source code and numerical experiments used for this study are available at« less
    Free, publicly-accessible full text available October 15, 2023
  4. Abstract Magnesium, the lightest structural metal, usually exhibits limited ambient plasticity when compressed along its crystallographic c -axis (the “hard” orientation of magnesium). Here we report large plasticity in c -axis compression of submicron magnesium single crystal achieved by a dual-stage deformation. We show that when the plastic flow gradually strain-hardens the magnesium crystal to gigapascal level, at which point dislocation mediated plasticity is nearly exhausted, the sample instantly pancakes without fracture, accompanying a conversion of the initial single crystal into multiple grains that roughly share a common rotation axis. Atomic-scale characterization, crystallographic analyses and molecular dynamics simulations indicate that the new grains can form via transformation of pyramidal to basal planes. We categorize this grain formation as “deformation graining”. The formation of new grains rejuvenates massive dislocation slip and deformation twinning to enable large plastic strains.
    Free, publicly-accessible full text available December 1, 2023
  5. Abstract Background The spatiotemporal variation of observed trace gases (NO 2 , SO 2 , O 3 ) and particulate matter (PM 2.5 , PM 10 ) were investigated over cities of Yangtze River Delta (YRD) region including Nanjing, Hefei, Shanghai and Hangzhou. Furthermore, the characteristics of different pollution episodes, i.e., haze events (visibility < 7 km, relative humidity < 80%, and PM 2.5  > 40 µg/m 3 ) and complex pollution episodes (PM 2.5  > 35 µg/m 3 and O 3  > 160 µg/m 3 ) were studied over the cities of the YRD region. The impact of China clean air action plan on concentration of aerosols and trace gases is examined. The impacts of trans-boundary pollution and different meteorological conditions were also examined. Results The highest annual mean concentrations of PM 2.5 , PM 10 , NO 2 and O 3 were found for 2019 over all the cities. The annual mean concentrations of PM 2.5 , PM 10 , and NO 2 showed continuous declines from 2019 to 2021 due to emission control measures and implementation of the Clean Air Action plan over all the cities of the YRD region. The annual mean O 3 levels showed a decline in 2020 over all the cities ofmore »YRD region, which is unprecedented since the beginning of the China’s National environmental monitoring program since 2013. However, a slight increase in annual O 3 was observed in 2021. The highest overall means of PM 2.5 , PM 10 , SO 2 , and NO 2 were observed over Hefei, whereas the highest O 3 levels were found in Nanjing. Despite the strict control measures, PM 2.5 and PM 10 concentrations exceeded the Grade-1 National Ambient Air Quality Standards (NAAQS) and WHO (World Health Organization) guidelines over all the cities of the YRD region. The number of haze days was higher in Hefei and Nanjing, whereas the complex pollution episodes or concurrent occurrence of O 3 and PM 2.5 pollution days were higher in Hangzhou and Shanghai. The in situ data for SO 2 and NO 2 showed strong correlation with Tropospheric Monitoring Instrument (TROPOMI) satellite data. Conclusions Despite the observed reductions in primary pollutants concentrations, the secondary pollutants formation is still a concern for major metropolises. The increase in temperature and lower relative humidity favors the accumulation of O 3 , while low temperature, low wind speeds and lower relative humidity favor the accumulation of primary pollutants. This study depicts different air pollution problems for different cities inside a region. Therefore, there is a dire need to continuous monitoring and analysis of air quality parameters and design city-specific policies and action plans to effectively deal with the metropolitan pollution.« less
    Free, publicly-accessible full text available December 1, 2023
  6. Graph Neural Networks (GNNs) have shown satisfying performance in various graph analytical problems. Hence, they have become the de facto solution in a variety of decision-making scenarios. However, GNNs could yield biased results against certain demographic subgroups. Some recent works have empirically shown that the biased structure of the input network is a significant source of bias for GNNs. Nevertheless, no studies have systematically scrutinized which part of the input network structure leads to biased predictions for any given node. The low transparency on how the structure of the input network influences the bias in GNN outcome largely limits the safe adoption of GNNs in various decision-critical scenarios. In this paper, we study a novel research problem of structural explanation of bias in GNNs. Specifically, we propose a novel post-hoc explanation framework to identify two edge sets that can maximally account for the exhibited bias and maximally contribute to the fairness level of the GNN prediction for any given node, respectively. Such explanations not only provide a comprehensive understanding of bias/fairness of GNN predictions but also have practical significance in building an effective yet fair GNN model. Extensive experiments on real-world datasets validate the effectiveness of the proposed framework towardsmore »delivering effective structural explanations for the bias of GNNs. Open-source code can be found at« less
    Free, publicly-accessible full text available August 14, 2023
  7. Graph Neural Networks (GNNs) have shown great power in learning node representations on graphs. However, they may inherit historical prejudices from training data, leading to discriminatory bias in predictions. Although some work has developed fair GNNs, most of them directly borrow fair representation learning techniques from non-graph domains without considering the potential problem of sensitive attribute leakage caused by feature propagation in GNNs. However, we empirically observe that feature propagation could vary the correlation of previously innocuous non-sensitive features to the sensitive ones. This can be viewed as a leakage of sensitive information which could further exacerbate discrimination in predictions. Thus, we design two feature masking strategies according to feature correlations to highlight the importance of considering feature propagation and correlation variation in alleviating discrimination. Motivated by our analysis, we propose Fair View Graph Neural Network (FairVGNN) to generate fair views of features by automatically identifying and masking sensitive-correlated features considering correlation variation after feature propagation. Given the learned fair views, we adaptively clamp weights of the encoder to avoid using sensitive-related features. Experiments on real-world datasets demonstrate that FairVGNN enjoys a better trade-off between model utility and fairness.
    Free, publicly-accessible full text available August 14, 2023
  8. The bistable fluttering response of heavy inverted flags with different aspect ratios ( $AR$ ) is investigated to determine how the vortical structures affect the intermittent vibration response of the flag. A heavy inverted flag in a uniform flow may exhibit several response modes; amongst them are three major modes that occur over an extended velocity range: stationary, large-scale periodic oscillation and one-sided deflected modes. Significant hysteretic bistability is observed at the transition between these modes for all $AR$ , which is notably different from the conventional flag vibration with a fixed leading edge and free trailing edge where no hysteresis is observed at the lower $AR$ limit ( $AR<1$ ). The difference is associated with the distinct roles of vortices around the flag. Experiments with flags made of spring steel are conducted in a wind tunnel, where the flow speed is steadily increased and later decreased to obtain different oscillatory modes of the heavy inverted flags. The experimental results are used to validate the numerical model of the same problem. It is found that different critical velocities exist for increasing and decreasing flow velocities, and there is a sustained hysteresis for all $AR$ controlled by the initiation threshold andmore »growth of the leading-edge and side-edge vortices. The effect of the vortices in the bistable oscillation regime is quantified by formulating a modal force partitioning approach. It is shown that $AR$ can significantly alter the static and dynamic vortex interaction with the flexible plate, thereby changing the flag's hysteresis behaviour and bistable response.« less
    Free, publicly-accessible full text available July 10, 2023
  9. Free, publicly-accessible full text available August 1, 2023
  10. Random Forests (RFs) are at the cutting edge of supervised machine learning in terms of prediction performance, especially in genomics. Iterative RFs (iRFs) use a tree ensemble from iteratively modified RFs to obtain predictive and stable nonlinear or Boolean interactions of features. They have shown great promise for Boolean biological interaction discovery that is central to advancing functional genomics and precision medicine. However, theoretical studies into how tree-based methods discover Boolean feature interactions are missing. Inspired by the thresholding behavior in many biological processes, we first introduce a discontinuous nonlinear regression model, called the “Locally Spiky Sparse” (LSS) model. Specifically, the LSS model assumes that the regression function is a linear combination of piecewise constant Boolean interaction terms. Given an RF tree ensemble, we define a quantity called “Depth-Weighted Prevalence” (DWP) for a set of signed features S ± . Intuitively speaking, DWP( S ± ) measures how frequently features in S ± appear together in an RF tree ensemble. We prove that, with high probability, DWP( S ± ) attains a universal upper bound that does not involve any model coefficients, if and only if S ± corresponds to a union of Boolean interactions under the LSS model. Consequentially,more »we show that a theoretically tractable version of the iRF procedure, called LSSFind, yields consistent interaction discovery under the LSS model as the sample size goes to infinity. Finally, simulation results show that LSSFind recovers the interactions under the LSS model, even when some assumptions are violated.« less
    Free, publicly-accessible full text available May 31, 2023