skip to main content

Title: A theory of high dimensional regression with arbitrary correlations between input features and target functions: sample complexity, multiple descent curves and a hierarchy of phase transitions
The performance of neural networks depends on precise relationships between four distinct ingredients: the architecture, the loss function, the statistical structure of inputs, and the ground truth target function. Much theoretical work has focused on understanding the role of the first two ingredients under highly simplified models of random uncorrelated data and target functions. In contrast, performance likely relies on a conspiracy between the statistical structure of the input distribution and the structure of the function to be learned. To understand this better we revisit ridge regression in high dimensions, which corresponds to an exceedingly simple architecture and loss function, but we analyze its performance under arbitrary correlations between input features and the target function. We find a rich mathematical structure that includes: (1) a dramatic reduction in sample complexity when the target function aligns with data anisotropy; (2) the existence of multiple descent curves; (3) a sequence of phase transitions in the performance, loss landscape, and optimal regularization as a function of the amount of data that explains the first two effects.
Authors:
;
Award ID(s):
1845166
Publication Date:
NSF-PAR ID:
10293707
Journal Name:
International Conference on Machine Learning
Volume:
139
Sponsoring Org:
National Science Foundation
More Like this
  1. We propose a novel family of connectionist models based on kernel machines and consider the problem of learning layer by layer a compositional hypothesis class (i.e., a feedforward, multilayer architecture) in a supervised setting. In terms of the models, we present a principled method to “kernelize” (partly or completely) any neural network (NN). With this method, we obtain a counterpart of any given NN that is powered by kernel machines instead of neurons. In terms of learning, when learning a feedforward deep architecture in a supervised setting, one needs to train all the components simultaneously using backpropagation (BP) since there are no explicit targets for the hidden layers (Rumelhart, Hinton, & Williams, 1986). We consider without loss of generality the two-layer case and present a general framework that explicitly characterizes a target for the hidden layer that is optimal for minimizing the objective function of the network. This characterization then makes possible a purely greedy training scheme that learns one layer at a time, starting from the input layer. We provide instantiations of the abstract framework under certain architectures and objective functions. Based on these instantiations, we present a layer-wise training algorithm for an l-layer feedforward network for classification, wheremore »l≥2 can be arbitrary. This algorithm can be given an intuitive geometric interpretation that makes the learning dynamics transparent. Empirical results are provided to complement our theory. We show that the kernelized networks, trained layer-wise, compare favorably with classical kernel machines as well as other connectionist models trained by BP. We also visualize the inner workings of the greedy kernelized models to validate our claim on the transparency of the layer-wise algorithm.« less
  2. The traditional von Neumann architecture limits the increase in computing efficiency and results in massive power consumption in modern computers due to the separation of storage and processing units. The novel neuromorphic computation system, an in-memory computing architecture with low power consumption, is aimed to break the bottleneck and meet the needs of the next generation of artificial intelligence (AI) systems. Thus, it is urgent to find a memory technology to implement the neuromorphic computing nanosystem. Nowadays, the silicon-based flash memory dominates non-volatile memory market, however, it is facing challenging issues to achieve the requirements of future data storage device development due to the drawbacks, such as scaling issue, relatively slow operation speed, and high voltage for program/erase operations. The emerging resistive random-access memory (RRAM) has prompted extensive research as its simple two-terminal structure, including top electrode (TE) layer, bottom electrode (BE) layer, and an intermediate resistive switching (RS) layer. It can utilize a temporary and reversible dielectric breakdown to cause the RS phenomenon between the high resistance state (HRS) and the low resistance state (LRS). RRAM is expected to outperform conventional memory device with the advantages, notably its low-voltage operation, short programming time, great cyclic stability, and good scalability.more »Among the materials for RS layer, indium gallium zinc oxide (IGZO) has shown attractive prospects in abundance and high atomic diffusion property of oxygen atoms, transparency. Additionally, its electrical properties can be easily modulated by controlling the stoichiometric ratio of indium and gallium as well as oxygen potential in the sputter gas. Moreover, since the IGZO can be applied to both the thin-film transistor (TFT) channel and RS layer, it has a great potential for fully integrated transparent electronics application. In this work, we proposed amorphous transparent IGZO-based RRAMs and investigated switching behaviors of the memory cells prepared with different top electrodes. First, ITO was choosing to serve as both TE and BE to achieve high transmittance. A multi-target magnetron sputtering system was employed to deposit all three layers (TE, RS, BE layers) on glass substrate. I-V characteristics were evaluated by a semiconductor parameter analyzer, and the bipolar RS feature of our RRAM devices was demonstrated by typical butterfly curves. The optical transmission analysis was carried out via a UV-Vis spectrometer and the average transmittance was around 80% out of entire devices in the visible-light wavelength range, implying high transparency. We adjusted the oxygen partial pressure during the sputtering of IGZO to optimize the property because the oxygen vacancy concentration governs the RS performance. Electrode selection is crucial and can impact the performance of the whole device. Thus, Cu TE was chosen for our second type of device because the diffusion of Cu ions can be beneficial for the formation of the conductive filament (CF). A ~5 nm SiO 2 barrier layer was employed between TE and RS layers to confine the diffusion of Cu into the RS layer. At the same time, this SiO 2 inserting layer can provide an additional interfacial series resistance in the device to lower the off current, consequently, improve the on/off ratio and whole performance. Finally, an oxygen affinity metal Ti was selected as the TE for our third type of device because the concentration of the oxygen atoms can be shifted towards the Ti electrode, which provides an oxygengettering activity near the Ti metal. This process may in turn lead to the formation of a sub-stoichiometric region in the neighboring oxide that is believed to be the origin of better performance. In conclusion, the transparent amorphous IGZO-based RRAMs were established. To tune the property of RS layer, the sputtering conditions of RS were varied. To investigate the influence of TE selections on switching performance of RRAMs, we integrated a set of TE materials, and a barrier layer on IGZO-based RRAM and compared the switch characteristics. Our encouraging results clearly demonstrate that IGZO is a promising material in RRAM applications and breaking the bottleneck of current memory technologies.« less
  3. Abstract
    Site description. This data package consists of data obtained from sampling surface soil (the 0-7.6 cm depth profile) in black mangrove (Avicennia germinans) dominated forest and black needlerush (Juncus roemerianus) saltmarsh along the Gulf of Mexico coastline in peninsular west-central Florida, USA. This location has a subtropical climate with mean daily temperatures ranging from 15.4 °C in January to 27.8 °C in August, and annual precipitation of 1336 mm. Precipitation falls as rain primarily between June and September. Tides are semi-diurnal, with 0.57 m median amplitudes during the year preceding sampling (U.S. NOAA National Ocean Service, Clearwater Beach, Florida, station 8726724). Sea-level rise is 4.0 ± 0.6 mm per year (1973-2020 trend, mean ± 95 % confidence interval, NOAA NOS Clearwater Beach station). The A. germinans mangrove zone is either adjacent to water or fringed on the seaward side by a narrow band of red mangrove (Rhizophora mangle). A near-monoculture of J. roemerianus is often adjacent to and immediately landward of the A. germinans zone. The transition from the mangrove to the J. roemerianus zone is variable in our study area. An abrupt edge between closed-canopy mangrove and J. roemerianus monoculture may extend for up to several hundred metersMore>>
  4. Introduction: Computed tomography perfusion (CTP) imaging requires injection of an intravenous contrast agent and increased exposure to ionizing radiation. This process can be lengthy, costly, and potentially dangerous to patients, especially in emergency settings. We propose MAGIC, a multitask, generative adversarial network-based deep learning model to synthesize an entire CTP series from only a non-contrasted CT (NCCT) input. Materials and Methods: NCCT and CTP series were retrospectively retrieved from 493 patients at UF Health with IRB approval. The data were deidentified and all images were resized to 256x256 pixels. The collected perfusion data were analyzed using the RapidAI CT Perfusion analysis software (iSchemaView, Inc. CA) to generate each CTP map. For each subject, 10 CTP slices were selected. Each slice was paired with one NCCT slice at the same location and two NCCT slices at a predefined vertical offset, resulting in 4.3K CTP images and 12.9K NCCT images used for training. The incorporation of a spatial offset into the NCCT input allows MAGIC to more accurately synthesize cerebral perfusive structures, increasing the quality of the generated images. The studies included a variety of indications, including healthy tissue, mild infarction, and severe infarction. The proposed MAGIC model incorporates a novel multitaskmore »architecture, allowing for the simultaneous synthesis of four CTP modalities: mean transit time (MTT), cerebral blood flow (CBF), cerebral blood volume (CBV), and time to peak (TTP). We propose a novel Physicians-in-the-loop module in the model's architecture, acting as a tunable layer that allows physicians to manually adjust the amount of anatomic detail present in the synthesized CTP series. Additionally, we propose two novel loss terms: multi-modal connectivity loss and extrema loss. The multi-modal connectivity loss leverages the multi-task nature to assert that the mathematical relationship between MTT, CBF, and CBV is satisfied. The extrema loss aids in learning regions of elevated and decreased activity in each modality, allowing for MAGIC to accurately learn the characteristics of diagnostic regions of interest. Corresponding NCCT and CTP slices were paired along the vertical axis. The model was trained for 100 epochs on a NVIDIA TITAN X GPU. Results and Discussion: The MAGIC model’s performance was evaluated on a sample of 40 patients from the UF Health dataset. Across all CTP modalities, MAGIC was able to accurately produce images with high structural agreement between the entire synthesized and clinical perfusion images (SSIMmean=0.801 , UQImean=0.926). MAGIC was able to synthesize CTP images to accurately characterize cerebral circulatory structures and identify regions of infarct tissue, as shown in Figure 1. A blind binary evaluation was conducted to assess the presence of cerebral infarction in both the synthesized and clinical perfusion images, resulting in the synthesized images correctly predicting the presence of cerebral infarction with 87.5% accuracy. Conclusions: We proposed a MAGIC model whose novel deep learning structures and loss terms enable high-quality synthesis of CTP maps and characterization of circulatory structures solely from NCCT images, potentially eliminating the requirement for the injection of an intravenous contrast agent and elevated radiation exposure during perfusion imaging. This makes MAGIC a beneficial tool in a clinical scenario increasing the overall safety, accessibility, and efficiency of cerebral perfusion and facilitating better patient outcomes. Acknowledgements: This work was partially supported by the National Science Foundation, IIS-1908299 III: Small: Modeling Multi-Level Connectivity of Brain Dynamics + REU Supplement, to the University of Florida.« less
  5. Systems for ML inference are widely deployed today, but they typically optimize ML inference workloads using techniques designed for conventional data serving workloads and miss critical opportunities to leverage the statistical nature of ML. In this paper, we present WILLUMP, an optimizer for ML inference that introduces two statistically-motivated optimizations targeting ML applications whose performance bottleneck is feature computation. First, WILLUMP automatically cascades feature computation for classification queries: WILLUMP classifies most data inputs using only high-value, low-cost features selected through empirical observations of ML model performance, improving query performance by up to 5× without statistically significant accuracy loss. Second, WILLUMP accurately approximates ML top-K queries, discarding low-scoring inputs with an automatically constructed approximate model and then ranking the remainder with a more powerful model, improving query performance by up to 10× with minimal accuracy loss. WILLUMP automatically tunes these optimizations’ parameters to maximize query performance while meeting an accuracy target. Moreover, WILLUMP complements these statistical optimizations with compiler optimizations to automatically generate fast inference code for ML applications. We show that WILLUMP improves the end-to-end performance of real-world ML inference pipelines curated from major data science competitions by up to 16× without statistically significant loss of accuracy.