skip to main content

Attention:

The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 5:00 PM ET until 11:00 PM ET on Friday, June 21 due to maintenance. We apologize for the inconvenience.


Title: A theory of high dimensional regression with arbitrary correlations between input features and target functions: sample complexity, multiple descent curves and a hierarchy of phase transitions
The performance of neural networks depends on precise relationships between four distinct ingredients: the architecture, the loss function, the statistical structure of inputs, and the ground truth target function. Much theoretical work has focused on understanding the role of the first two ingredients under highly simplified models of random uncorrelated data and target functions. In contrast, performance likely relies on a conspiracy between the statistical structure of the input distribution and the structure of the function to be learned. To understand this better we revisit ridge regression in high dimensions, which corresponds to an exceedingly simple architecture and loss function, but we analyze its performance under arbitrary correlations between input features and the target function. We find a rich mathematical structure that includes: (1) a dramatic reduction in sample complexity when the target function aligns with data anisotropy; (2) the existence of multiple descent curves; (3) a sequence of phase transitions in the performance, loss landscape, and optimal regularization as a function of the amount of data that explains the first two effects.  more » « less
Award ID(s):
1845166
NSF-PAR ID:
10293707
Author(s) / Creator(s):
;
Date Published:
Journal Name:
International Conference on Machine Learning
Volume:
139
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Nonlinear response history analysis (NLRHA) is generally considered to be a reliable and robust method to assess the seismic performance of buildings under strong ground motions. While NLRHA is fairly straightforward to evaluate individual structures for a select set of ground motions at a specific building site, it becomes less practical for performing large numbers of analyses to evaluate either (1) multiple models of alternative design realizations with a site‐specific set of ground motions, or (2) individual archetype building models at multiple sites with multiple sets of ground motions. In this regard, surrogate models offer an alternative to running repeated NLRHAs for variable design realizations or ground motions. In this paper, a recently developed surrogate modeling technique, called probabilistic learning on manifolds (PLoM), is presented to estimate structural seismic response. Essentially, the PLoM method provides an efficient stochastic model to develop mappings between random variables, which can then be used to efficiently estimate the structural responses for systems with variations in design/modeling parameters or ground motion characteristics. The PLoM algorithm is introduced and then used in two case studies of 12‐story buildings for estimating probability distributions of structural responses. The first example focuses on the mapping between variable design parameters of a multidegree‐of‐freedom analysis model and its peak story drift and acceleration responses. The second example applies the PLoM technique to estimate structural responses for variations in site‐specific ground motion characteristics. In both examples, training data sets are generated for orthogonal input parameter grids, and test data sets are developed for input parameters with prescribed statistical distributions. Validation studies are performed to examine the accuracy and efficiency of the PLoM models. Overall, both examples show good agreement between the PLoM model estimates and verification data sets. Moreover, in contrast to other common surrogate modeling techniques, the PLoM model is able to preserve correlation structure between peak responses. Parametric studies are conducted to understand the influence of different PLoM tuning parameters on its prediction accuracy.

     
    more » « less
  2. We propose a novel family of connectionist models based on kernel machines and consider the problem of learning layer by layer a compositional hypothesis class (i.e., a feedforward, multilayer architecture) in a supervised setting. In terms of the models, we present a principled method to “kernelize” (partly or completely) any neural network (NN). With this method, we obtain a counterpart of any given NN that is powered by kernel machines instead of neurons. In terms of learning, when learning a feedforward deep architecture in a supervised setting, one needs to train all the components simultaneously using backpropagation (BP) since there are no explicit targets for the hidden layers (Rumelhart, Hinton, & Williams, 1986). We consider without loss of generality the two-layer case and present a general framework that explicitly characterizes a target for the hidden layer that is optimal for minimizing the objective function of the network. This characterization then makes possible a purely greedy training scheme that learns one layer at a time, starting from the input layer. We provide instantiations of the abstract framework under certain architectures and objective functions. Based on these instantiations, we present a layer-wise training algorithm for an l-layer feedforward network for classification, where l≥2 can be arbitrary. This algorithm can be given an intuitive geometric interpretation that makes the learning dynamics transparent. Empirical results are provided to complement our theory. We show that the kernelized networks, trained layer-wise, compare favorably with classical kernel machines as well as other connectionist models trained by BP. We also visualize the inner workings of the greedy kernelized models to validate our claim on the transparency of the layer-wise algorithm. 
    more » « less
  3. By mimicking biomimetic synaptic processes, the success of artificial intelligence (AI) has been astounding with various applications such as driving automation, big data analysis, and natural-language processing.[1-4] Due to a large quantity of data transmission between the separated memory unit and the logic unit, the classical computing system with von Neumann architecture consumes excessive energy and has a significant processing delay.[5] Furthermore, the speed difference between the two units also causes extra delay, which is referred to as the memory wall.[6, 7] To keep pace with the rapid growth of AI applications, enhanced hardware systems that particularly feature an energy-efficient and high-speed hardware system need to be secured. The novel neuromorphic computing system, an in-memory architecture with low power consumption, has been suggested as an alternative to the conventional system. Memristors with analog-type resistive switching behavior are a promising candidate for implementing the neuromorphic computing system since the devices can modulate the conductance with cycles that act as synaptic weights to process input signals and store information.[8, 9]

    The memristor has sparked tremendous interest due to its simple two-terminal structure, including top electrode (TE), bottom electrode (BE), and an intermediate resistive switching (RS) layer. Many oxide materials, including HfO2, Ta2O5, and IGZO, have extensively been studied as an RS layer of memristors. Silicon dioxide (SiO2) features 3D structural conformity with the conventional CMOS technology and high wafer-scale homogeneity, which has benefited modern microelectronic devices as dielectric and/or passivation layers. Therefore, the use of SiO2as a memristor RS layer for neuromorphic computing is expected to be compatible with current Si technology with minimal processing and material-related complexities.

    In this work, we proposed SiO2-based memristor and investigated switching behaviors metallized with different reduction potentials by applying pure Cu and Ag, and their alloys with varied ratios. Heavily doped p-type silicon was chosen as BE in order to exclude any effects of the BE ions on the memristor performance. We previously reported that the selection of TE is crucial for achieving a high memory window and stable switching performance. According to the study which compares the roles of Cu (switching stabilizer) and Ag (large switching window performer) TEs for oxide memristors, we have selected the TE materials and their alloys to engineer the SiO2-based memristor characteristics. The Ag TE leads to a larger memory window of the SiO2memristor, but the device shows relatively large variation and less reliability. On the other hand, the Cu TE device presents uniform gradual switching behavior which is in line with our previous report that Cu can be served as a stabilizer, but with small on/off ratio.[9] These distinct performances with Cu and Ag metallization leads us to utilize a Cu/Ag alloy as the TE. Various compositions of Cu/Ag were examined for the optimization of the memristor TEs. With a Cu/Ag alloying TE with optimized ratio, our SiO2based memristor demonstrates uniform switching behavior and memory window for analog switching applications. Also, it shows ideal potentiation and depression synaptic behavior under the positive/negative spikes (pulse train).

    In conclusion, the SiO2memristors with different metallization were established. To tune the property of RS layer, the sputtering conditions of RS were varied. To investigate the influence of TE selections on switching performance of memristor, we integrated Cu, Ag and Cu/Ag alloy as TEs and compared the switch characteristics. Our encouraging results clearly demonstrate that SiO2with Cu/Ag is a promising memristor device with synaptic switching behavior in neuromorphic computing applications.

    Acknowledgement

    This work was supported by the U.S. National Science Foundation (NSF) Award No. ECCS-1931088. S.L. and H.W.S. acknowledge the support from the Improvement of Measurement Standards and Technology for Mechanical Metrology (Grant No. 22011044) by KRISS.

    References

    [1] Younget al.,IEEE Computational Intelligence Magazine,vol. 13, no. 3, pp. 55-75, 2018.

    [2] Hadsellet al.,Journal of Field Robotics,vol. 26, no. 2, pp. 120-144, 2009.

    [3] Najafabadiet al.,Journal of Big Data,vol. 2, no. 1, p. 1, 2015.

    [4] Zhaoet al.,Applied Physics Reviews,vol. 7, no. 1, 2020.

    [5] Zidanet al.,Nature Electronics,vol. 1, no. 1, pp. 22-29, 2018.

    [6] Wulfet al.,SIGARCH Comput. Archit. News,vol. 23, no. 1, pp. 20–24, 1995.

    [7] Wilkes,SIGARCH Comput. Archit. News,vol. 23, no. 4, pp. 4–6, 1995.

    [8] Ielminiet al.,Nature Electronics,vol. 1, no. 6, pp. 333-343, 2018.

    [9] Changet al.,Nano Letters,vol. 10, no. 4, pp. 1297-1301, 2010.

    [10] Qinet al., Physica Status Solidi (RRL) - Rapid Research Letters, pssr.202200075R1, In press, 2022.

     
    more » « less
  4. The National Space Weather Action Plan, NERC Reliability Standard TPL-007-1,2 associated FERC Orders 779, 830 and subsequent actions, emerging standard TPL-007-3, as well as Executive Order 13744 have prepared the regulatory framework and roadmap for assessing and mitigating the impact on critical infrastructure from space weather. These actions have resulted in an emerging set of benchmarks against which the statistical probability of damage to critical components such as power transmission system high-voltage transformers can be assessed; for the first time the impacts on the intensity of geomagnetically induced currents (GICs) due to the spatial variability of the geomagnetic field and of the Earth’s electrical conductivity structure can be examined systematically. While at present not a strict requirement of the existing reliability standards, there is growing evidence that the strongly three-dimensional nature of the electrical conductivity structure of the North American crust and mantle (heretofore ‘ground conductivity’) has a first-order impact on GIC intensity, with considerable local and regional variability. The strongly location dependent ground electric field intensification and attenuation due to 3-D ground conductivity variations has an equivalent impact on assessment of risk to critical infrastructure due to HEMP (E3 phase) sources of geomagnetic disturbances (GMDs) as it does for natural GMDs. From 2006-2018, Oregon State University (OSU) under NSF EarthScope Program support, installed and acquired ground electric and magnetic field time series (magnetotelluric, or MT) data on a grid of station locations spaced ~70-km apart, at 1161 long-period MT stations covering nearly ⅔ of CONUS. The US Geological Survey completed 47 additional MT stations using functionally identical instrumentation, and the two data sets were merged and made available in the public domain. OSU and its project collaborators have also collected hundreds of wider frequency bandwidth, more densely-spaced MT station data under other project support, and these have been or will be released for public access in the near future. NSF funding was not available to make possible collection of EarthScope MT data 1/3 of CONUS in the southern tier of states, in a band from central California in the west to Alabama in the east and extending along the Gulf Coast and Deep South. OSU, with NASA support just received, plans to complete MT station installation in the remainder of California this year, and with additional support both anticipated and proposed, we hope to complete the MT array in the remainder of CONUS. For this first time this will provide national-scale 3-D electrical conductivity/MT impedance data throughout the US portion of the contiguous North American power grid. Complementary planning and proposal efforts are underway in Canada, including collaborations between OSU, Athabasca University and other Canadian academic and industry groups. In the present work, we apply algorithms we have developed to make use of real-time streams of US Geological Survey, Natural Resources Canada (and other) magnetic observatory data, and the EarthScope and other MT data sets to provide quasi-real time predictions of the geomagnetically induced voltages at high-voltage transmission system transformers/power buses. This goes beyond the statistical benchmarking process currently encapsulated in NERC reliability standards. We seek initially to provide real-time information to power utility control room operators, in the form of a heat map showing which assets are likely experiencing stress due to induced currents. These assessments will be ground-truthed against transmission system sensor data (PMUs, GIC monitors, voltage waveforms and harmonics where available), and by applying machine learning methods we hope to extend this approach to transmission systems that have sparse or non-existent GIC monitoring sensor infrastructure. Ultimately by incorporating predictive models of the geomagnetic field using satellite data as inputs rather than real-time ground magnetic field measurements, a near-term probabilistic assessment of risk to transformers may be possible, ideally providing at least a 15-minute forecast to utility operators. There has been a concerted effort by NOAA to develop a real-time geomagnetically induced ground electric field data product that makes use of our EarthScope MT data, which includes the strong impacts on GICs due to 3-D ground conductivity structure. Both OSU and the USGS have developed methods to determine the GIC-related voltages at substations by integrating the ground electric fields along power transmission line paths. Under National Science Foundation support, the present team of investigators is taking the next step, of applying the GIC-related voltages as inputs to quasi-real time power flow models of the power transmission grid in order to obtain realistic and verifiable predictions of the intensity of induced GICs, the reactive power loss due to GICs, and of GIC effects on the current and voltage waveforms, such as the harmonic distortion. As we work toward integration of predicted induced substation voltages with power flow models, we’ve modified the RTS-GMLC (Reliability Test System Grid Modernization Lab Consortium) test case (https://github.com/GridMod/RTS-GMLC) by moving the geographic location of the case to central Oregon. With the assistance of LANL we have the complete AC and DC network of the RTS-GMLC case, and we are working to integrate the complete case information into Julia (using the PowerModels and PowerModelsGMD packages of LANL), or into PowerWorld. Along a parallel track, we have performed GIC voltage calculations using our geophysical algorithm for a realistic GMD event (Halloween event) for the test case, resulting in GIC transmission line voltages that can be added into our power system model. We’ll discuss our progress in integrating the geophysical estimates of transformer voltages and our DC model using LANL's Julia and PowerModelsGMD package, for power flow simulations on the test case, and to determine the GIC flows and possible impacts on the power waveforms in the system elements. 
    more » « less
  5. The traditional von Neumann architecture limits the increase in computing efficiency and results in massive power consumption in modern computers due to the separation of storage and processing units. The novel neuromorphic computation system, an in-memory computing architecture with low power consumption, is aimed to break the bottleneck and meet the needs of the next generation of artificial intelligence (AI) systems. Thus, it is urgent to find a memory technology to implement the neuromorphic computing nanosystem. Nowadays, the silicon-based flash memory dominates non-volatile memory market, however, it is facing challenging issues to achieve the requirements of future data storage device development due to the drawbacks, such as scaling issue, relatively slow operation speed, and high voltage for program/erase operations. The emerging resistive random-access memory (RRAM) has prompted extensive research as its simple two-terminal structure, including top electrode (TE) layer, bottom electrode (BE) layer, and an intermediate resistive switching (RS) layer. It can utilize a temporary and reversible dielectric breakdown to cause the RS phenomenon between the high resistance state (HRS) and the low resistance state (LRS). RRAM is expected to outperform conventional memory device with the advantages, notably its low-voltage operation, short programming time, great cyclic stability, and good scalability. Among the materials for RS layer, indium gallium zinc oxide (IGZO) has shown attractive prospects in abundance and high atomic diffusion property of oxygen atoms, transparency. Additionally, its electrical properties can be easily modulated by controlling the stoichiometric ratio of indium and gallium as well as oxygen potential in the sputter gas. Moreover, since the IGZO can be applied to both the thin-film transistor (TFT) channel and RS layer, it has a great potential for fully integrated transparent electronics application. In this work, we proposed amorphous transparent IGZO-based RRAMs and investigated switching behaviors of the memory cells prepared with different top electrodes. First, ITO was choosing to serve as both TE and BE to achieve high transmittance. A multi-target magnetron sputtering system was employed to deposit all three layers (TE, RS, BE layers) on glass substrate. I-V characteristics were evaluated by a semiconductor parameter analyzer, and the bipolar RS feature of our RRAM devices was demonstrated by typical butterfly curves. The optical transmission analysis was carried out via a UV-Vis spectrometer and the average transmittance was around 80% out of entire devices in the visible-light wavelength range, implying high transparency. We adjusted the oxygen partial pressure during the sputtering of IGZO to optimize the property because the oxygen vacancy concentration governs the RS performance. Electrode selection is crucial and can impact the performance of the whole device. Thus, Cu TE was chosen for our second type of device because the diffusion of Cu ions can be beneficial for the formation of the conductive filament (CF). A ~5 nm SiO 2 barrier layer was employed between TE and RS layers to confine the diffusion of Cu into the RS layer. At the same time, this SiO 2 inserting layer can provide an additional interfacial series resistance in the device to lower the off current, consequently, improve the on/off ratio and whole performance. Finally, an oxygen affinity metal Ti was selected as the TE for our third type of device because the concentration of the oxygen atoms can be shifted towards the Ti electrode, which provides an oxygengettering activity near the Ti metal. This process may in turn lead to the formation of a sub-stoichiometric region in the neighboring oxide that is believed to be the origin of better performance. In conclusion, the transparent amorphous IGZO-based RRAMs were established. To tune the property of RS layer, the sputtering conditions of RS were varied. To investigate the influence of TE selections on switching performance of RRAMs, we integrated a set of TE materials, and a barrier layer on IGZO-based RRAM and compared the switch characteristics. Our encouraging results clearly demonstrate that IGZO is a promising material in RRAM applications and breaking the bottleneck of current memory technologies. 
    more » « less