NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Simple linear attention language models balance the recall-throughput tradeoff

Arora, Simran; Eyuboglu, Sabri; Zhang, Michael; Timalsina, Aman; Alberti, Silas; Zou, James; Rudra, Atri; Re, Christopher (July 2024, Proceedings of the 41st International Conference on Machine Learning)

Full Text Available
Fast Algorithms for a New Relaxation of Optimal Transport

Charikar, Moses; Chen, Beidi; Re, Christopher; Waingarten, Erik (July 2023, Proceedings of Machine Learning Research)

Full Text Available
Skill-it! A data-driven skills framework for understanding and training language models

Chen, Mayee F; Roberts, Nicholas; Bhatia, Kush; Wang, Jue; Zhang, Ce; Sala, Frederic; Re, Christopher (November 2023, Advances in neural information processing systems)

Full Text Available
Laughing Hyena Distillery: Extracting Compact Recurrences From Convolutions

Massaroli, Stefano; Poli, Michael; Fu, Daniel Y.; Kumbong, Hermann; Parnichkun, Rom N.; Timalsina, Aman; Romero, David W.; McIntyre, Quinn; Chen, Beidi; Rudra, Atri; et al (December 2023, Proceedings of the 36th Neural Information Processing Systems Conference (NeurIPS))

Full Text Available
Pixelated Butterfly: Simple and Efficient Sparse training for Neural Network Models

Chen, Beidi; Dao, Tri; Liang, Kaizhao; Yang, Jiaming; Song, Zhao; Rudra, Atri; Re, Christopher (January 2022, International Conference on Learning Representations (ICLR))

Full Text Available
Monarch: Expressive Structured Matrices for Efficient and Accurate Training

Dao, Tri; Chen, Beidi; Sohoni, Nimit S.; Desai, Arjun; Poli, Michael; Grogan, Jessica; Liu, Alexander; Rao, Aniruddh; Rudra, Atri; Re, Christopher (January 2022, Proceedings of the 39th International Conference on Machine Learning)

Full Text Available
HiPPO: Recurrent Memory with Optimal Polynomial Projections

Gu, Albert; Dao, Tri; Ermon, Stefano; Rudra, Atri; Re, Christopher (December 2020, Advances in neural information processing systems)
null (Ed.)
A central problem in learning from sequential data is representing cumulative history in an incremental fashion as more data is processed. We introduce a general framework (HiPPO) for the online compression of continuous signals and discrete time series by projection onto polynomial bases. Given a measure that specifies the importance of each time step in the past, HiPPO produces an optimal solution to a natural online function approximation problem. As special cases, our framework yields a short derivation of the recent Legendre Memory Unit (LMU) from first principles, and generalizes the ubiquitous gating mechanism of recurrent neural networks such as GRUs. This formal framework yields a new memory update mechanism (HiPPO-LegS) that scales through time to remember all history, avoiding priors on the timescale. HiPPO-LegS enjoys the theoretical benefits of timescale robustness, fast updates, and bounded gradients. By incorporating the memory dynamics into recurrent neural networks, HiPPO RNNs can empirically capture complex temporal dependencies. On the benchmark permuted MNIST dataset, HiPPO-LegS sets a new state-of-the-art accuracy of 98.3%. Finally, on a novel trajectory classification task testing robustness to out-of-distribution timescales and missing data, HiPPO-LegS outperforms RNN and neural ODE baselines by 25-40% accuracy.
more » « less
Full Text Available
Scatterbrain: Unifying Sparse and Low-rank Attention

Chen, Beidi; Dao, Tri; Winsor, Eric; Song, Zhao; Rudra, Atri; Re, Christopher (January 2021, Advances in neural information processing systems)

Full Text Available
Combining Recurrent, Convolutional, and Continuous-time Models with Linear State Space Layers

Gu, Albert; Johnson, Isys; Goel, Karan; Saab, Khaled; Dao, Tri; Rudra, Atri; Re, Christopher (January 2021, Advances in neural information processing systems)

Recurrent neural networks (RNNs), temporal convolutions, and neural differential equations (NDEs) are popular families of deep learning models for time-series data, each with unique strengths and tradeoffs in modeling power and computational efficiency. We introduce a simple sequence model inspired by control systems that generalizes these approaches while addressing their shortcomings. The Linear State-Space Layer (LSSL) maps a sequence u↦y by simply simulating a linear continuous-time state-space representation ˙x=Ax+Bu,y=Cx+Du. Theoretically, we show that LSSL models are closely related to the three aforementioned families of models and inherit their strengths. For example, they generalize convolutions to continuous-time, explain common RNN heuristics, and share features of NDEs such as time-scale adaptation. We then incorporate and generalize recent theory on continuous-time memorization to introduce a trainable subset of structured matrices A that endow LSSLs with long-range memory. Empirically, stacking LSSL layers into a simple deep neural network obtains state-of-the-art results across time series benchmarks for long dependencies in sequential image classification, real-world healthcare regression tasks, and speech. On a difficult speech classification task with length-16000 sequences, LSSL outperforms prior approaches by 24 accuracy points, and even outperforms baselines that use hand-crafted features on 100x shorter sequences.
more » « less
Full Text Available
Sparse Recovery for Orthogonal Polynomial Transforms

https://doi.org/10.4230/LIPIcs.ICALP.2020.58

Gilbert, Anna; Gu, Albert; Re, Christopher; Rudra, Atri; Wootters, Mary (June 2020, Leibniz international proceedings in informatics)
null (Ed.)
In this paper we consider the following sparse recovery problem. We have query access to a vector 𝐱 ∈ ℝ^N such that x̂ = 𝐅 𝐱 is k-sparse (or nearly k-sparse) for some orthogonal transform 𝐅. The goal is to output an approximation (in an 𝓁₂ sense) to x̂ in sublinear time. This problem has been well-studied in the special case that 𝐅 is the Discrete Fourier Transform (DFT), and a long line of work has resulted in sparse Fast Fourier Transforms that run in time O(k ⋅ polylog N). However, for transforms 𝐅 other than the DFT (or closely related transforms like the Discrete Cosine Transform), the question is much less settled. In this paper we give sublinear-time algorithms - running in time poly(k log(N)) - for solving the sparse recovery problem for orthogonal transforms 𝐅 that arise from orthogonal polynomials. More precisely, our algorithm works for any 𝐅 that is an orthogonal polynomial transform derived from Jacobi polynomials. The Jacobi polynomials are a large class of classical orthogonal polynomials (and include Chebyshev and Legendre polynomials as special cases), and show up extensively in applications like numerical analysis and signal processing. One caveat of our work is that we require an assumption on the sparsity structure of the sparse vector, although we note that vectors with random support have this property with high probability. Our approach is to give a very general reduction from the k-sparse sparse recovery problem to the 1-sparse sparse recovery problem that holds for any flat orthogonal polynomial transform; then we solve this one-sparse recovery problem for transforms derived from Jacobi polynomials. Frequently, sparse FFT algorithms are described as implementing such a reduction; however, the technical details of such works are quite specific to the Fourier transform and moreover the actual implementations of these algorithms do not use the 1-sparse algorithm as a black box. In this work we give a reduction that works for a broad class of orthogonal polynomial families, and which uses any 1-sparse recovery algorithm as a black box.
more » « less
Full Text Available

« Prev Next »

Search for: All records