NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Compressing Neural Networks using Learnable 1D Non-Linear Functions

https://doi.org/10.1145/3705926

Singh, Gaurav; Bazargan, Kia (June 2025, ACM Transactions on Reconfigurable Technology and Systems)

As deep learning models grow in size to achieve state-of-the-art accuracy, there is a pressing need for compact models. To address this challenge, we introduce a novel operation called Personal Self-Attention (PSA). It is specifically designed to learn non-linear 1D functions, enhancing existing spline-based methods while remaining compatible with gradient backpropagation. By integrating these non-linear functions with linear transformations, we can achieve the accuracy of larger models but with significantly smaller hidden dimensions, which is crucial for FPGA implementations. We evaluate PSA by implementing it in a Multi-Layer Perceptron (MLP)-based vision model, ResMLP, and testing it on the CIFAR-10 classification task. MLP is gaining increasing popularity due to its widespread use in large-language models. Our results confirm that PSA achieves equivalent accuracy with a 2\(\times\)smaller hidden size compared to conventional MLPs. Furthermore, by quantizing our non-linear function into a simple Lookup Table (LUT), we reduce the number of operations required by 45–28%, which offers significant benefits for hardware accelerators. To showcase this, we design an end-to-end unrolled streaming accelerator for ResMLP, demonstrating that our compressed model maintains an 88% accuracy while reducing LUT\(+\)DSP resource requirements by 25%, and doubling throughput to 32 kFPS. Additionally, we implement a fixed-size SIMD accelerator for the same compressed model that achieves a 62.1% improvement in throughput while only consuming 3.5% extra LUTs.
more » « less
Free, publicly-accessible full text available June 30, 2026
SimBU: Self-Similarity-Based Hybrid Binary-Unary Computing for Nonlinear Functions

https://doi.org/10.1109/TC.2024.3398512

Khataei, Alireza; Singh, Gaurav; Bazargan, Kia (September 2024, IEEE Transactions on Computers)

Unary computing is a relatively new method for implementing arbitrary nonlinear functions that uses unpacked thermometer number encoding, enabling much lower hardware costs. In its original form, unary computing provides no trade-off between accuracy and hardware cost. In this work, we propose a novel self-similarity-based method to optimize the previous hybrid binary-unary work and provide it with the trade-off between accuracy and hardware cost by introducing controlled levels of approximation. Looking for self-similarity between different parts of a function allows us to implement a very small subset of core unique subfunctions and derive the rest of the subfunctions from this core using simple linear transformations. We compare our method to previous works such as FloPoCo-LUT (lookup table), HBU (hybrid binary-unary) and FloPoCo-PPA (piecewise polynomial approximation) on several 8–12-bit nonlinear functions including Log, Exp, Sigmoid, GELU, Sin, and Sqr, which are frequently used in neural networks and image processing applications. The area × delay hardware cost of our method is on average 32%–60% better than previous methods in both exact and approximate implementations. We also extend our method to multivariate nonlinear functions and show on average 78%–92% improvement over previous work.
more » « less
Full Text Available
Compression with Attention: Learning in Lower Dimensions

Singh, Gaurav; Bazargan, Kia (June 2024, Design Automation Conference)

With deep learning models ever ballooning in size to push state-ofthe- art accuracy improvements, efforts to find compact models have become necessary. To meet such an objective, we propose a novel operation called Personal Self-Attention (PSA). It is designed specifically to learn non-linear 1-D functions faster than existing architectures like Multi-Layer Perceptron (MLP) and Polynomial-based methods, while being highly compatible with gradient backpropagation. We show that by stacking and combining these non-linear functions with linear transformations, we can achieve the same accuracy as a larger model but with a hidden dimension that is significantly smaller. To test our contribution, we implemented PSA on an MLP-based vision model called ResMLP and tested it against vision classification tasks on SVHN, and CIFAR-10 datasets. We show how PSA pushes the pareto-front, achieving the same accuracy with 2 − 6× smaller hidden-dimension sizes compared to the conventional MLP structures. Further, by quantizing our non-linear function, the PSA can be mapped to a simple lookup table, allowing for very efficient translation to FPGA hardware. We demonstrate this by designing an unrolled high-throughput accelerator for ResMLP using nearly 1.5× fewer DSPs with PSA compared to a conventional MLP architecture while achieving the same accuracy of 86% and throughput of 29k FPS.
more » « less
Full Text Available
Speclearn: Spectrum Learning in Shared Band under Extreme Noise Conditions

https://doi.org/10.1109/DySPAN60163.2024.10632806

Rahman, Mohammad Hasibur; Singh, Gaurav; Roy, Debashri (May 2024, IEEE)

Full Text Available
HyP-NeRF: Learning Improved NeRF Priors using a HyperNetwork

Sen, Bipasha; Singh, Gaurav; Agarwal, Aditya; Agaram, Rohith; MadhavaKrishna, K; Sridhar, Srinath (December 2023, NeurIPS 2023)

Neural Radiance Fields (NeRF) have become an increasingly popular representation to capture high-quality appearance and shape of scenes and objects. However, learning generalizable NeRF priors over categories of scenes or objects has been challenging due to the high dimensionality of network weight space. To address the limitations of existing work on generalization, multi-view consistency and to improve quality, we propose HyP-NeRF, a latent conditioning method for learning generalizable category-level NeRF priors using hypernetworks. Rather than using hypernetworks to estimate only the weights of a NeRF, we estimate both the weights and the multi-resolution hash encodings resulting in significant quality gains. To improve quality even further, we incorporate a denoise and finetune strategy that denoises images rendered from NeRFs estimated by the hypernetwork and finetunes it while retaining multiview consistency. These improvements enable us to use HyP-NeRF as a generalizable prior for multiple downstream tasks including NeRF reconstruction from single-view or cluttered scenes and text-to-NeRF. We provide qualitative comparisons and evaluate HyP-NeRF on three tasks: generalization, compression, and retrieval, demonstrating our state-of-the-art results.
more » « less
Optimizing Hybrid Binary-Unary Hardware Accelerators Using Self-Similarity Measures

https://doi.org/10.1109/FCCM57271.2023.00020

Khataei, Alireza; Singh, Gaurav; Bazargan, Kia (May 2023, 2023 IEEE 31st Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM))

Unary computing is a relatively new method for implementing non-linear functions using few hardware resources compared to binary computing. In its original form, unary computing provides no trade-off between accuracy and hardware cost. In this work, we propose a novel self-similarity-based method to optimize the previous hybrid binary-unary method and provide it with the trade-off between accuracy and hardware cost by introducing controlled levels of approximation. Given a target maximum error, our method breaks a function into sub-functions and tries to find the minimum set of unique sub-functions that can derive all the other ones through trivial bit-wise transformations. We compare our method to previous works such as HBU (hybrid binary-unary) and FloPoCo-PPA (piece-wise polynomial approximation) on a number of non-linear functions including Log, Exp, Sigmoid, GELU, Sin, and Sqr, which are used in neural networks and image processing applications. Without any loss of accuracy, our method can improve the area-delay-product hardware cost of HBU on average by 7% at 8-bit, 20% at 10-bit, and 35% at 12-bit resolutions. Given the approximation of the least significant bit, our method reduces the hardware cost of HBU on average by 21% at 8-bit, 49% at 10-bit, and 60% at 12-bit resolutions, and using the same error budget as given to FloPoCo-PPA, it reduces the hardware cost of FloPoCo-PPA on average by 79% at 8-bit, 58% at 10-bit, and 9% at 12-bit resolutions. We finally show the benefits of our method by implementing a 10-bit homomorphic filter, which is used in image processing applications. Our method can implement the filter with no quality loss at lower hardware cost than what the previous approximate and exact methods can achieve.
more » « less
Full Text Available
Mechanical Characterization of Compliant Cellular Robots. Part I: Passive Stiffness

https://doi.org/10.1115/1.4054615

Singh, Gaurav; Nawroj, Ahsan; Dollar, Aaron M. (April 2023, Journal of Mechanisms and Robotics)

Abstract Modular active cell robots (MACROs) are a design paradigm for modular robotic hardware that uses only two components, namely actuators and passive compliant joints. Under the MACRO approach, a large number of actuators and joints are connected to create mesh-like cellular robotic structures that can be actuated to achieve large deformation and shape change. In this two-part paper, we study the importance of different possible mesh topologies within the MACRO framework. Regular and semi-regular tilings of the plane are used as the candidate mesh topologies and simulated using finite element analysis (FEA). In Part 1, we use FEA to evaluate their passive stiffness characteristics. Using a strain-energy method, the homogenized material properties (Young's modulus, shear modulus, and Poisson's ratio) of the different mesh topologies are computed and compared. The results show that the stiffnesses increase with increasing nodal connectivity and that stretching-dominated topologies have higher stiffness compared to bending-dominated ones. We also investigate the role of relative actuator-node stiffness on the overall mesh characteristics. This analysis shows that the stiffness of stretching-dominated topologies scales directly with their cross-section area whereas bending-dominated ones do not have such a direct relationship.
more » « less
Full Text Available
Mechanical Characterization of Compliant Cellular Robots. Part II: Active Strain

https://doi.org/10.1115/1.4054613

Singh, Gaurav; Nawroj, Ahsan; Dollar, Aaron M. (April 2023, Journal of Mechanisms and Robotics)

Abstract Modular active cell robots (MACROs) is a design approach in which a large number of linear actuators and passive compliant joints are assembled to create an active structure with a repeating unit cell. Such a mesh-like robotic structure can be actuated to achieve large deformation and shape-change. In this two-part paper, we use finite element analysis (FEA) to model the deformation behavior of different MACRO mesh topologies and evaluate their passive and active mechanical characteristics. In Part I, we presented the passive stiffness characteristics of different MACRO meshes. In this Part II of the paper, we investigate the active strain characteristics of planar MACRO meshes. Using FEA, we quantify and compare the strains generated for the specific choice of MACRO mesh topology and further for the specific choice of actuators actuated in that particular mesh. We simulate a series of actuation modes that are based on the angular orientation of the actuators within the mesh and show that such actuation modes result in deformation that is independent of the size of the mesh. We also show that there exists a subset of such actuation modes that spans the range of deformation behavior. Finally, we compare the actuation effort required to actuate different MACRO meshes and show that the actuation effort is related to the nodal connectivity of the mesh.
more » « less
Full Text Available
Approximate Hybrid Binary-Unary Computing with Applications in BERT Language Model and Image Processing

https://doi.org/10.1145/3543622.3573181

Khataei, Alireza; Singh, Gaurav; Bazargan, Kia (February 2023, FPGA '23: Proceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays)

We propose a novel method for approximate hardware implementation of univariate math functions with significantly fewer hardware resources compared to previous approaches. Examples of such functions include exp(x) and the activation function GELU(x), both used in transformer networks, gamma(x), which is used in image processing, and other functions such as tanh(x), cosh(x), sq(x), and sqrt(x). The method builds on previous works on hybrid binary-unary computing. The novelty in our approach is that we break a function into a number of sub-functions such that implementing each sub-function becomes cheap, and converting the output of the sub-functions to binary becomes almost trivial. Our method also uses self-similarity in functions to further reduce the cost. We compare our method to the conventional binary, previous stochastic computing, and hybrid binary-unary methods on several functions at 8-, 12-, and 16-bit resolutions. While preserving high accuracy, our method outperforms previous works in terms of hardware cost, e.g., tolerating less than 0.01 mean absolute error, our method reduces the (area x latency) cost on average by 5, 7, and 2 orders of magnitude, compared to the conventional binary, stochastic computing, and hybrid binary-unary methods, respectively. Ultimately, we demonstrate the potential benefits of our method for natural language processing and image processing applications. We deploy our method to implement major blocks in an encoding layer of BERT language model, and also the Roberts Cross edge detection algorithm. Both include non-linear functions.
more » « less
Full Text Available
Finite Element Modeling of Internally Actuated Triangular Lattice and Its Variants for Modular Active Cell Robots (MACROs)

https://doi.org/10.1109/LRA.2022.3166106

Singh, Gaurav; Dollar, Aaron M. (July 2022, IEEE Robotics and Automation Letters)

Full Text Available

Search for: All records