skip to main content

Search for: All records

Creators/Authors contains: "Lee, Jason"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available June 30, 2024
  2. Traditional analyses of gradient descent show that when the largest eigenvalue of the Hessian, also known as the sharpness S(θ), is bounded by 2/η, training is "stable" and the training loss decreases monotonically. Recent works, however, have observed that this assumption does not hold when training modern neural networks with full batch or large batch gradient descent. Most recently, Cohen et al. (2021) observed two important phenomena. The first, dubbed progressive sharpening, is that the sharpness steadily increases throughout training until it reaches the instability cutoff 2/η. The second, dubbed edge of stability, is that the sharpness hovers at 2/η for the remainder of training while the loss continues decreasing, albeit non-monotonically. We demonstrate that, far from being chaotic, the dynamics of gradient descent at the edge of stability can be captured by a cubic Taylor expansion: as the iterates diverge in direction of the top eigenvector of the Hessian due to instability, the cubic term in the local Taylor expansion of the loss function causes the curvature to decrease until stability is restored. This property, which we call self-stabilization, is a general property of gradient descent and explains its behavior at the edge of stability. A key consequence ofmore »self-stabilization is that gradient descent at the edge of stability implicitly follows projected gradient descent (PGD) under the constraint S(θ)≤2/η. Our analysis provides precise predictions for the loss, sharpness, and deviation from the PGD trajectory throughout training, which we verify both empirically in a number of standard settings and theoretically under mild conditions. Our analysis uncovers the mechanism for gradient descent's implicit bias towards stability.« less
    Free, publicly-accessible full text available May 28, 2024
  3. Key-value (KV) software has proven useful to a wide variety of applications including analytics, time-series databases, and distributed file systems. To satisfy the requirements of diverse workloads, KV stores have been carefully tailored to best match the performance characteristics of underlying solid-state block devices. Emerging KV storage device is a promising technology for both simplifying the KV software stack and improving the performance of persistent storage-based applications. However, while providing fast, predictable put and get operations, existing KV storage devices don’t natively support range queries which are critical to all three types of applications described above. In this paper, we present KVRangeDB, a software layer that enables processing range queries for existing hash-based KV solid-state disks (KVSSDs). As an effort to adapt to the performance characteristics of emerging KVSSDs, KVRangeDB implements log-structured merge tree key index that reduces compaction I/O, merges keys when possible, and provides separate caches for indexes and values. We evaluated the KVRangeDB under a set of representative workloads, and compared its performance with two existing database solutions: a Rocksdb variant ported to work with the KVSSD, and Wisckey, a key-value database that is carefully tuned for conventional block devices. On filesystem aging workloads, KVRangeDB outperforms Wisckeymore »by 23.7x in terms of throughput and reduce CPU usage and external write amplifications by 14.3x and 9.8x, respectively.« less
    Free, publicly-accessible full text available January 1, 2024
  4. Abstract Electronic textiles capable of sensing, powering, and communication can be used to non-intrusively monitor human health during daily life. However, achieving these functionalities with clothing is challenging because of limitations in the electronic performance, flexibility and robustness of the underlying materials, which must endure repeated mechanical, thermal and chemical stresses during daily use. Here, we demonstrate electronic textile systems with functionalities in near-field powering and communication created by digital embroidery of liquid metal fibers. Owing to the unique electrical and mechanical properties of the liquid metal fibers, these electronic textiles can conform to body surfaces and establish robust wireless connectivity with nearby wearable or implantable devices, even during strenuous exercise. By transferring optimized electromagnetic patterns onto clothing in this way, we demonstrate a washable electronic shirt that can be wirelessly powered by a smartphone and continuously monitor axillary temperature without interfering with daily activities.
    Free, publicly-accessible full text available December 1, 2023
  5. Loh, Po-ling ; Raginsky, Maxim (Ed.)
    Significant theoretical work has established that in specific regimes, neural networks trained by gradient descent behave like kernel methods. However, in practice, it is known that neural networks strongly outperform their associated kernels. In this work, we explain this gap by demonstrating that there is a large class of functions which cannot be efficiently learned by kernel methods but can be easily learned with gradient descent on a two layer neural network outside the kernel regime by learning representations that are relevant to the target task. We also demonstrate that these representations allow for efficient transfer learning, which is impossible in the kernel regime. Specifically, we consider the problem of learning polynomials which depend on only a few relevant directions, i.e. of the form f⋆(x)=g(Ux) where U:\Rd→\Rr with d≫r. When the degree of f⋆ is p, it is known that n≍dp samples are necessary to learn f⋆ in the kernel regime. Our primary result is that gradient descent learns a representation of the data which depends only on the directions relevant to f⋆. This results in an improved sample complexity of n≍d2 and enables transfer learning with sample complexity independent of d.
  6. Sample-efficiency guarantees for offline reinforcement learning (RL) often rely on strong assumptions on both the function classes (e.g., Bellman-completeness) and the data coverage (e.g., all-policy concentrability). Despite the recent efforts on relaxing these assumptions, existing works are only able to relax one of the two factors, leaving the strong assumption on the other factor intact. As an important open problem, can we achieve sample-efficient offline RL with weak assumptions on both factors? In this paper we answer the question in the positive. We analyze a simple algorithm based on the primal-dual formulation of MDPs, where the dual variables (discounted occupancy) are modeled using a density-ratio function against offline data. With proper regularization, the algorithm enjoys polynomial sample complexity, under only realizability and single-policy concentrability. We also provide alternative analyses based on different assumptions to shed light on the nature of primal-dual algorithms for offline RL.
  7. Significant theoretical work has established that in specific regimes, neural networks trained by gradient descent behave like kernel methods. However, in practice, it is known that neural networks strongly outperform their associated kernels. In this work, we explain this gap by demonstrating that there is a large class of functions which cannot be efficiently learned by kernel methods but can be easily learned with gradient descent on a two layer neural network outside the kernel regime by learning representations that are relevant to the target task. We also demonstrate that these representations allow for efficient transfer learning, which is impossible in the kernel regime. Specifically, we consider the problem of learning polynomials which depend on only a few relevant directions, i.e. of the form $f(x)=g(Ux)$ where $U: \R^d \to \R^r$ with $d≫r$. When the degree of f⋆ is p, it is known that n≍dp samples are necessary to learn f⋆ in the kernel regime. Our primary result is that gradient descent learns a representation of the data which depends only on the directions relevant to f. This results in an improved sample complexity of n≍d2r+drp. Furthermore, in a transfer learning setup where the data distributions in the source and targetmore »domain share the same representation U but have different polynomial heads we show that a popular heuristic for transfer learning has a target sample complexity independent of d.« less
  8. We investigated the dissociation of dications and trications of three polycyclic aromatic hydrocarbons (PAHs), fluorene, phenanthrene, and pyrene. PAHs are a family of molecules ubiquitous in space and involved in much of the chemistry of the interstellar medium. In our experiments, ions are formed by interaction with 30.3 nm extreme ultraviolet (XUV) photons, and their velocity map images are recorded using a PImMS2 multi-mass imaging sensor. Application of recoil-frame covariance analysis allows the total kinetic energy release (TKER) associated with multiple fragmentation channels to be determined to high precision, ranging 1.94–2.60 eV and 2.95–5.29 eV for the dications and trications, respectively. Experimental measurements are supported by Born–Oppenheimer molecular dynamics (BOMD) simulations.
    Free, publicly-accessible full text available October 5, 2023
  9. Abstract Inner-shell photoelectron spectroscopy provides an element-specific probe of molecular structure, as core-electron binding energies are sensitive to the chemical environment. Short-wavelength femtosecond light sources, such as Free-Electron Lasers (FELs), even enable time-resolved site-specific investigations of molecular photochemistry. Here, we study the ultraviolet photodissociation of the prototypical chiral molecule 1-iodo-2-methylbutane, probed by extreme-ultraviolet (XUV) pulses from the Free-electron LASer in Hamburg (FLASH) through the ultrafast evolution of the iodine 4d binding energy. Methodologically, we employ electron-ion partial covariance imaging as a technique to isolate otherwise elusive features in a two-dimensional photoelectron spectrum arising from different photofragmentation pathways. The experimental and theoretical results for the time-resolved electron spectra of the 4d 3/2 and 4d 5/2 atomic and molecular levels that are disentangled by this method provide a key step towards studying structural and chemical changes from a specific spectator site.
    Free, publicly-accessible full text available December 1, 2023