skip to main content


This content will become publicly available on December 1, 2024

Title: Sharpen data-driven prediction rules of individual large earthquakes with aid of Fourier and Gauss
Abstract

Predicting individual large earthquakes (EQs)’ locations, magnitudes, and timing remains unreachable. The author’s prior study shows that individual large EQs have unique signatures obtained from multi-layered data transformations. Via spatio-temporal convolutions, decades-long EQ catalog data are transformed into pseudo-physics quantities (e.g., energy, power, vorticity, and Laplacian), which turn into surface-like information via Gauss curvatures. Using these new features, a rule-learning machine learning approach unravels promising prediction rules. This paper suggests further data transformation via Fourier transformation (FT). Results show that FT-based new feature can help sharpen the prediction rules. Feasibility tests of large EQs ($$M\ge$$M6.5) over the past 40 years in the western U.S. show promise, shedding light on data-driven prediction of individual large EQs. The handshake among ML methods, Fourier, and Gauss may help answer the long-standing enigma of seismogenesis.

 
more » « less
Award ID(s):
1931380 2129796
NSF-PAR ID:
10491691
Author(s) / Creator(s):
Publisher / Repository:
Nature
Date Published:
Journal Name:
Scientific Reports
Volume:
13
Issue:
1
ISSN:
2045-2322
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Statistical descriptions of earthquakes offer important probabilistic information, and newly emerging technologies of high-precision observations and machine learning collectively advance our knowledge regarding complex earthquake behaviors. Still, there remains a formidable knowledge gap for predicting individual large earthquakes’ locations and magnitudes. Here, this study shows that the individual large earthquakes may have unique signatures that can be represented by new high-dimensional features—Gauss curvature-based coordinates. Particularly, the observed earthquake catalog data are transformed into a number of pseudo physics quantities (i.e., energy, power, vorticity, and Laplacian) which turn into smooth surface-like information via spatio-temporal convolution, giving rise to the new high-dimensional coordinates. Validations with 40-year earthquakes in the West U.S. region show that the new coordinates appear to hold uniqueness for individual large earthquakes ($$M_w \ge 7.0$$Mw7.0), and the pseudo physics quantities help identify a customized data-driven prediction model. A Bayesian evolutionary algorithm in conjunction with flexible bases can identify a data-driven model, demonstrating its promising reproduction of individual large earthquake’s location and magnitude. Results imply that an individual large earthquake can be distinguished and remembered while its best-so-far model can be customized by machine learning. This study paves a new way to data-driven automated evolution of individual earthquake prediction.

     
    more » « less
  2. Abstract

    Massive gully land consolidation projects, launched in China’s Loess Plateau, aim to restore 2667$$\mathrm{km}^2$$km2agricultural lands in total by consolidating 2026 highly eroded gullies. This effort represents a social engineering project where the economic development and livelihood of the farming families are closely tied to the ability of these emergent landscapes to provide agricultural services. Whether these ‘time zero’ landscapes have the resilience to provide a sustainable soil condition such as soil organic carbon (SOC) content remains unknown. By studying two watersheds, one of which is a control site, we show that the consolidated gully serves as an enhanced carbon sink, where the magnitude of SOC increase rate (1.0$$\mathrm{g\,C}/\mathrm{m}^2/\mathrm{year}$$gC/m2/year) is about twice that of the SOC decrease rate (− 0.5$$\mathrm{g\,C}/\mathrm{m}^2/\mathrm{year}$$gC/m2/year) in the surrounding natural watershed. Over a 50-year co-evolution of landscape and SOC turnover, we find that the dominant mechanisms that determine the carbon cycling are different between the consolidated gully and natural watersheds. In natural watersheds, the flux of SOC transformation is mainly driven by the flux of SOC transport; but in the consolidated gully, the transport has little impact on the transformation. Furthermore, we find that extending the surface carbon residence time has the potential to efficiently enhance carbon sequestration from the atmosphere with a rate as high as 8$$\mathrm{g\,C}/\mathrm{m}^2/\mathrm{year}$$gC/m2/yearcompared to the current 0.4$$\mathrm{g\,C}/\mathrm{m}^2/\mathrm{year}$$gC/m2/year. The success for the completion of all gully consolidation would lead to as high as 26.67$$\mathrm{Gg\,C}/\mathrm{year}$$GgC/yearsequestrated into soils. This work, therefore, not only provides an assessment and guidance of the long-term sustainability of the ‘time zero’ landscapes but also a solution for sequestration$$\hbox {CO}_2$$CO2into soils.

     
    more » « less
  3. Abstract Background

    Protein–protein interaction (PPI) is vital for life processes, disease treatment, and drug discovery. The computational prediction of PPI is relatively inexpensive and efficient when compared to traditional wet-lab experiments. Given a new protein, one may wish to find whether the protein has any PPI relationship with other existing proteins. Current computational PPI prediction methods usually compare the new protein to existing proteins one by one in a pairwise manner. This is time consuming.

    Results

    In this work, we propose a more efficient model, called deep hash learning protein-and-protein interaction (DHL-PPI), to predict all-against-all PPI relationships in a database of proteins. First, DHL-PPI encodes a protein sequence into a binary hash code based on deep features extracted from the protein sequences using deep learning techniques. This encoding scheme enables us to turn the PPI discrimination problem into a much simpler searching problem. The binary hash code for a protein sequence can be regarded as a number. Thus, in the pre-screening stage of DHL-PPI, the string matching problem of comparing a protein sequence against a database withMproteins can be transformed into a much more simpler problem: to find a number inside a sorted array of lengthM. This pre-screening process narrows down the search to a much smaller set of candidate proteins for further confirmation. As a final step, DHL-PPI uses the Hamming distance to verify the final PPI relationship.

    Conclusions

    The experimental results confirmed that DHL-PPI is feasible and effective. Using a dataset with strictly negative PPI examples of four species, DHL-PPI is shown to be superior or competitive when compared to the other state-of-the-art methods in terms of precision, recall or F1 score. Furthermore, in the prediction stage, the proposed DHL-PPI reduced the time complexity from$$O(M^2)$$O(M2)to$$O(M\log M)$$O(MlogM)for performing an all-against-all PPI prediction for a database withMproteins. With the proposed approach, a protein database can be preprocessed and stored for later search using the proposed encoding scheme. This can provide a more efficient way to cope with the rapidly increasing volume of protein datasets.

     
    more » « less
  4. Abstract

    We introduce a family of Finsler metrics, called the$$L^p$$Lp-Fisher–Rao metrics$$F_p$$Fp, for$$p\in (1,\infty )$$p(1,), which generalizes the classical Fisher–Rao metric$$F_2$$F2, both on the space of densities$${\text {Dens}}_+(M)$$Dens+(M)and probability densities$${\text {Prob}}(M)$$Prob(M). We then study their relations to the Amari–C̆encov$$\alpha $$α-connections$$\nabla ^{(\alpha )}$$(α)from information geometry: on$${\text {Dens}}_+(M)$$Dens+(M), the geodesic equations of$$F_p$$Fpand$$\nabla ^{(\alpha )}$$(α)coincide, for$$p = 2/(1-\alpha )$$p=2/(1-α). Both are pullbacks of canonical constructions on$$L^p(M)$$Lp(M), in which geodesics are simply straight lines. In particular, this gives a new variational interpretation of$$\alpha $$α-geodesics as being energy minimizing curves. On$${\text {Prob}}(M)$$Prob(M), the$$F_p$$Fpand$$\nabla ^{(\alpha )}$$(α)geodesics can still be thought as pullbacks of natural operations on the unit sphere in$$L^p(M)$$Lp(M), but in this case they no longer coincide unless$$p=2$$p=2. Using this transformation, we solve the geodesic equation of the$$\alpha $$α-connection by showing that the geodesic are pullbacks of projections of straight lines onto the unit sphere, and they always cease to exists after finite time when they leave the positive part of the sphere. This unveils the geometric structure of solutions to the generalized Proudman–Johnson equations, and generalizes them to higher dimensions. In addition, we calculate the associate tensors of$$F_p$$Fp, and study their relation to$$\nabla ^{(\alpha )}$$(α).

     
    more » « less
  5. Abstract

    We continue the program of proving circuit lower bounds via circuit satisfiability algorithms. So far, this program has yielded several concrete results, proving that functions in$\mathsf {Quasi}\text {-}\mathsf {NP} = \mathsf {NTIME}[n^{(\log n)^{O(1)}}]$Quasi-NP=NTIME[n(logn)O(1)]and other complexity classes do not have small circuits (in the worst case and/or on average) from various circuit classes$\mathcal { C}$C, by showing that$\mathcal { C}$Cadmits non-trivial satisfiability and/or#SAT algorithms which beat exhaustive search by a minor amount. In this paper, we present a new strong lower bound consequence of having a non-trivial#SAT algorithm for a circuit class${\mathcal C}$C. Say that a symmetric Boolean functionf(x1,…,xn) issparseif it outputs 1 onO(1) values of${\sum }_{i} x_{i}$ixi. We show that for every sparsef, and for all “typical”$\mathcal { C}$C, faster#SAT algorithms for$\mathcal { C}$Ccircuits imply lower bounds against the circuit class$f \circ \mathcal { C}$fC, which may bestrongerthan$\mathcal { C}$Citself. In particular:

    #SAT algorithms fornk-size$\mathcal { C}$C-circuits running in 2n/nktime (for allk) implyNEXPdoes not have$(f \circ \mathcal { C})$(fC)-circuits of polynomial size.

    #SAT algorithms for$2^{n^{{\varepsilon }}}$2nε-size$\mathcal { C}$C-circuits running in$2^{n-n^{{\varepsilon }}}$2nnεtime (for someε> 0) implyQuasi-NPdoes not have$(f \circ \mathcal { C})$(fC)-circuits of polynomial size.

    Applying#SAT algorithms from the literature, one immediate corollary of our results is thatQuasi-NPdoes not haveEMAJACC0THRcircuits of polynomial size, whereEMAJis the “exact majority” function, improving previous lower bounds againstACC0[Williams JACM’14] andACC0THR[Williams STOC’14], [Murray-Williams STOC’18]. This is the first nontrivial lower bound against such a circuit class.

     
    more » « less