skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 10:00 PM to 12:00 AM ET on Tuesday, March 25 due to maintenance. We apologize for the inconvenience.


Search for: All records

Creators/Authors contains: "Singh, Gaurav"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Unary computing is a relatively new method for implementing arbitrary nonlinear functions that uses unpacked thermometer number encoding, enabling much lower hardware costs. In its original form, unary computing provides no trade-off between accuracy and hardware cost. In this work, we propose a novel self-similarity-based method to optimize the previous hybrid binary-unary work and provide it with the trade-off between accuracy and hardware cost by introducing controlled levels of approximation. Looking for self-similarity between different parts of a function allows us to implement a very small subset of core unique subfunctions and derive the rest of the subfunctions from this core using simple linear transformations. We compare our method to previous works such as FloPoCo-LUT (lookup table), HBU (hybrid binary-unary) and FloPoCo-PPA (piecewise polynomial approximation) on several 8–12-bit nonlinear functions including Log, Exp, Sigmoid, GELU, Sin, and Sqr, which are frequently used in neural networks and image processing applications. The area × delay hardware cost of our method is on average 32%–60% better than previous methods in both exact and approximate implementations. We also extend our method to multivariate nonlinear functions and show on average 78%–92% improvement over previous work. 
    more » « less
    Free, publicly-accessible full text available September 1, 2025
  2. With deep learning models ever ballooning in size to push state-ofthe- art accuracy improvements, efforts to find compact models have become necessary. To meet such an objective, we propose a novel operation called Personal Self-Attention (PSA). It is designed specifically to learn non-linear 1-D functions faster than existing architectures like Multi-Layer Perceptron (MLP) and Polynomial-based methods, while being highly compatible with gradient backpropagation. We show that by stacking and combining these non-linear functions with linear transformations, we can achieve the same accuracy as a larger model but with a hidden dimension that is significantly smaller. To test our contribution, we implemented PSA on an MLP-based vision model called ResMLP and tested it against vision classification tasks on SVHN, and CIFAR-10 datasets. We show how PSA pushes the pareto-front, achieving the same accuracy with 2 − 6× smaller hidden-dimension sizes compared to the conventional MLP structures. Further, by quantizing our non-linear function, the PSA can be mapped to a simple lookup table, allowing for very efficient translation to FPGA hardware. We demonstrate this by designing an unrolled high-throughput accelerator for ResMLP using nearly 1.5× fewer DSPs with PSA compared to a conventional MLP architecture while achieving the same accuracy of 86% and throughput of 29k FPS. 
    more » « less
    Free, publicly-accessible full text available June 26, 2025
  3. Free, publicly-accessible full text available May 13, 2025
  4. Neural Radiance Fields (NeRF) have become an increasingly popular representation to capture high-quality appearance and shape of scenes and objects. However, learning generalizable NeRF priors over categories of scenes or objects has been challenging due to the high dimensionality of network weight space. To address the limitations of existing work on generalization, multi-view consistency and to improve quality, we propose HyP-NeRF, a latent conditioning method for learning generalizable category-level NeRF priors using hypernetworks. Rather than using hypernetworks to estimate only the weights of a NeRF, we estimate both the weights and the multi-resolution hash encodings resulting in significant quality gains. To improve quality even further, we incorporate a denoise and finetune strategy that denoises images rendered from NeRFs estimated by the hypernetwork and finetunes it while retaining multiview consistency. These improvements enable us to use HyP-NeRF as a generalizable prior for multiple downstream tasks including NeRF reconstruction from single-view or cluttered scenes and text-to-NeRF. We provide qualitative comparisons and evaluate HyP-NeRF on three tasks: generalization, compression, and retrieval, demonstrating our state-of-the-art results. 
    more » « less
  5. Unary computing is a relatively new method for implementing non-linear functions using few hardware resources compared to binary computing. In its original form, unary computing provides no trade-off between accuracy and hardware cost. In this work, we propose a novel self-similarity-based method to optimize the previous hybrid binary-unary method and provide it with the trade-off between accuracy and hardware cost by introducing controlled levels of approximation. Given a target maximum error, our method breaks a function into sub-functions and tries to find the minimum set of unique sub-functions that can derive all the other ones through trivial bit-wise transformations. We compare our method to previous works such as HBU (hybrid binary-unary) and FloPoCo-PPA (piece-wise polynomial approximation) on a number of non-linear functions including Log, Exp, Sigmoid, GELU, Sin, and Sqr, which are used in neural networks and image processing applications. Without any loss of accuracy, our method can improve the area-delay-product hardware cost of HBU on average by 7% at 8-bit, 20% at 10-bit, and 35% at 12-bit resolutions. Given the approximation of the least significant bit, our method reduces the hardware cost of HBU on average by 21% at 8-bit, 49% at 10-bit, and 60% at 12-bit resolutions, and using the same error budget as given to FloPoCo-PPA, it reduces the hardware cost of FloPoCo-PPA on average by 79% at 8-bit, 58% at 10-bit, and 9% at 12-bit resolutions. We finally show the benefits of our method by implementing a 10-bit homomorphic filter, which is used in image processing applications. Our method can implement the filter with no quality loss at lower hardware cost than what the previous approximate and exact methods can achieve. 
    more » « less
  6. Abstract Modular active cell robots (MACROs) are a design paradigm for modular robotic hardware that uses only two components, namely actuators and passive compliant joints. Under the MACRO approach, a large number of actuators and joints are connected to create mesh-like cellular robotic structures that can be actuated to achieve large deformation and shape change. In this two-part paper, we study the importance of different possible mesh topologies within the MACRO framework. Regular and semi-regular tilings of the plane are used as the candidate mesh topologies and simulated using finite element analysis (FEA). In Part 1, we use FEA to evaluate their passive stiffness characteristics. Using a strain-energy method, the homogenized material properties (Young's modulus, shear modulus, and Poisson's ratio) of the different mesh topologies are computed and compared. The results show that the stiffnesses increase with increasing nodal connectivity and that stretching-dominated topologies have higher stiffness compared to bending-dominated ones. We also investigate the role of relative actuator-node stiffness on the overall mesh characteristics. This analysis shows that the stiffness of stretching-dominated topologies scales directly with their cross-section area whereas bending-dominated ones do not have such a direct relationship. 
    more » « less
  7. Abstract Modular active cell robots (MACROs) is a design approach in which a large number of linear actuators and passive compliant joints are assembled to create an active structure with a repeating unit cell. Such a mesh-like robotic structure can be actuated to achieve large deformation and shape-change. In this two-part paper, we use finite element analysis (FEA) to model the deformation behavior of different MACRO mesh topologies and evaluate their passive and active mechanical characteristics. In Part I, we presented the passive stiffness characteristics of different MACRO meshes. In this Part II of the paper, we investigate the active strain characteristics of planar MACRO meshes. Using FEA, we quantify and compare the strains generated for the specific choice of MACRO mesh topology and further for the specific choice of actuators actuated in that particular mesh. We simulate a series of actuation modes that are based on the angular orientation of the actuators within the mesh and show that such actuation modes result in deformation that is independent of the size of the mesh. We also show that there exists a subset of such actuation modes that spans the range of deformation behavior. Finally, we compare the actuation effort required to actuate different MACRO meshes and show that the actuation effort is related to the nodal connectivity of the mesh. 
    more » « less
  8. We propose a novel method for approximate hardware implementation of univariate math functions with significantly fewer hardware resources compared to previous approaches. Examples of such functions include exp(x) and the activation function GELU(x), both used in transformer networks, gamma(x), which is used in image processing, and other functions such as tanh(x), cosh(x), sq(x), and sqrt(x). The method builds on previous works on hybrid binary-unary computing. The novelty in our approach is that we break a function into a number of sub-functions such that implementing each sub-function becomes cheap, and converting the output of the sub-functions to binary becomes almost trivial. Our method also uses self-similarity in functions to further reduce the cost. We compare our method to the conventional binary, previous stochastic computing, and hybrid binary-unary methods on several functions at 8-, 12-, and 16-bit resolutions. While preserving high accuracy, our method outperforms previous works in terms of hardware cost, e.g., tolerating less than 0.01 mean absolute error, our method reduces the (area x latency) cost on average by 5, 7, and 2 orders of magnitude, compared to the conventional binary, stochastic computing, and hybrid binary-unary methods, respectively. Ultimately, we demonstrate the potential benefits of our method for natural language processing and image processing applications. We deploy our method to implement major blocks in an encoding layer of BERT language model, and also the Roberts Cross edge detection algorithm. Both include non-linear functions. 
    more » « less