skip to main content


Search for: All records

Creators/Authors contains: "Li, Xiang"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract Refractory multi-principal element alloys (RMPEAs) are promising materials for high-temperature structural applications. Here, we investigate the role of short-range ordering (SRO) on dislocation glide in the MoNbTi and TaNbTi RMPEAs using a multi-scale modeling approach. Monte carlo/molecular dynamics simulations with a moment tensor potential show that MoNbTi exhibits a much greater degree of SRO than TaNbTi and the local composition has a direct effect on the unstable stacking fault energies (USFEs). From mesoscale phase-field dislocation dynamics simulations, we find that increasing SRO leads to higher mean USFEs and stress required for dislocation glide. The gliding dislocations experience significant hardening due to pinning and depinning caused by random compositional fluctuations, with higher SRO decreasing the degree of USFE dispersion and hence, amount of hardening. Finally, we show how the morphology of an expanding dislocation loop is affected by the applied stress. 
    more » « less
    Free, publicly-accessible full text available December 1, 2024
  2. Abstract Motivation

    Modern methods for computation-intensive tasks in sequence analysis (e.g. read mapping, sequence alignment, genome assembly, etc.) often first transform each sequence into a list of short, regular-length seeds so that compact data structures and efficient algorithms can be employed to handle the ever-growing large-scale data. Seeding methods using kmers (substrings of length k) have gained tremendous success in processing sequencing data with low mutation/error rates. However, they are much less effective for sequencing data with high error rates as kmers cannot tolerate errors.

    Results

    We propose SubseqHash, a strategy that uses subsequences, rather than substrings, as seeds. Formally, SubseqHash maps a string of length n to its smallest subsequence of length k, k < n, according to a given order overall length-k strings. Finding the smallest subsequence of a string by enumeration is impractical as the number of subsequences grows exponentially. To overcome this barrier, we propose a novel algorithmic framework that consists of a specifically designed order (termed ABC order) and an algorithm that computes the minimized subsequence under an ABC order in polynomial time. We first show that the ABC order exhibits the desired property and the probability of hash collision using the ABC order is close to the Jaccard index. We then show that SubseqHash overwhelmingly outperforms the substring-based seeding methods in producing high-quality seed-matches for three critical applications: read mapping, sequence alignment, and overlap detection. SubseqHash presents a major algorithmic breakthrough for tackling the high error rates and we expect it to be widely adapted for long-reads analysis.

    Availability and implementation

    SubseqHash is freely available at https://github.com/Shao-Group/subseqhash.

     
    more » « less
  3. Free, publicly-accessible full text available August 4, 2024
  4. In hydrology, modeling streamflow remains a challenging task due to the limited availability of basin characteristics information such as soil geology and geomorphology. These characteristics may be noisy due to measurement errors or may be missing altogether. To overcome this challenge, we propose a knowledge-guided, probabilistic inverse modeling method for recovering physical characteristics from streamflow and weather data, which are more readily available. We compare our framework with state-of-the-art inverse models for estimating river basin characteristics. We also show that these estimates offer improvement in streamflow modeling as opposed to using the original basin characteristic values. Our inverse model offers a 3% improvement in R2 for the inverse model (basin characteristic estimation) and 6% for the forward model (streamflow prediction). Our framework also offers improved explainability since it can quantify uncertainty in both the inverse and the forward model. Uncertainty quantification plays a pivotal role in improving the explainability of machine learning models by providing additional insights into the reliability and limitations of model predictions. In our analysis, we assess the quality of the uncertainty estimates. Compared to baseline uncertainty quantification methods, our framework offers a 10% improvement in the dispersion of epistemic uncertainty and a 13% improvement in coverage rate. This information can help stakeholders understand the level of uncertainty associated with the predictions and provide a more comprehensive view of the potential outcomes. 
    more » « less
    Free, publicly-accessible full text available August 8, 2024
  5. Free, publicly-accessible full text available June 1, 2024
  6. Many sampling strategies commonly used in molecular dynamics, such as umbrella sampling and alchemical free energy methods, involve sampling from multiple states. The Multistate Bennett Acceptance Ratio (MBAR) formalism is a widely used way of recombining the resulting data. However, the error of the MBAR estimator is not well-understood: previous error analyses of MBAR assumed independent samples. In this work, we derive a central limit theorem for MBAR estimates in the presence of correlated data, further justifying the use of MBAR in practical applications. Moreover, our central limit theorem yields an estimate of the error that can be decomposed into contributions from the individual Markov chains used to sample the states. This gives additional insight into how sampling in each state affects the overall error. We demonstrate our error estimator on an umbrella sampling calculation of the free energy of isomerization of the alanine dipeptide and an alchemical calculation of the hydration free energy of methane. Our numerical results demonstrate that the time required for the Markov chain to decorrelate in individual states can contribute considerably to the total MBAR error, highlighting the importance of accurately addressing the effect of sample correlation. 
    more » « less
    Free, publicly-accessible full text available June 7, 2024
  7. Free, publicly-accessible full text available May 1, 2024
  8. In this article, we present and evaluate a true random number generator (TRNG) design that is compatible with the restrictions imposed by cloud-based Field Programmable Gate Array (FPGA) providers such as Amazon Web Services (AWS) EC2 F1. Because cloud FPGA providers disallow the ring oscillator circuits that conventionally generate TRNG entropy, our design is oscillator-free and uses clock jitter as its entropy source. The clock jitter is harvested with a time-to-digital converter (TDC) and a controllable delay line that is continuously tuned to compensate for process, voltage, and temperature variations. After describing the design, we present and validate a stochastic model that conservatively quantifies its worst-case entropy. We deploy and model the design in the cloud on 60 EC2 F1 FPGA instances to ensure sufficient randomness is captured. TRNG entropy is further validated using NIST test suites, and experiments are performed to understand how the TRNG responds to on-die power attacks that disturb the FPGA supply voltage in the vicinity of the TRNG. After introducing and validating our basic TRNG design, we introduce and validate a new variant that uses four instances of a linkable sampling module to increase the entropy per sample and improve throughput. The new variant improves throughput by 250% at a modest 17% increase in CLB count. 
    more » « less
    Free, publicly-accessible full text available March 31, 2024
  9. Free, publicly-accessible full text available March 1, 2024
  10. In smart grids, two-way communication between end-users and the grid allows frequent data exchange, which on one hand enhances users' experience, while on the other hand increase security and privacy risks. In this paper, we propose an efficient system to address security and privacy problems, in contrast to the data aggregation schemes with high cryptographic overheads. In the proposed system, users are grouped into local communities and trust-based blockchains are formed in each community to manage smart grid transactions, such as reporting aggregated meter reading, in a light-weight fashion. We show that the proposed system can meet the key security objectives with a detailed analysis. Also, experiments demonstrated that the proposed system is efficient and can provide satisfactory user experience, and the trust value design can easily distinguish benign users and bad actors. 
    more » « less
    Free, publicly-accessible full text available February 18, 2024