Sparse auto-encoders are useful for extracting low-dimensional representations from high-dimensional data. However, their performance degrades sharply when the input noise at test time differs from the noise employed during training. This limitation hinders the applicability of auto-encoders in real-world scenarios where the level of noise in the input is unpredictable. In this paper, we formalize single hidden layer sparse auto-encoders as a transform learning problem. Leveraging the transform modeling interpretation, we propose an optimization problem that leads to a predictive model invariant to the noise level at test time. In other words, the same pre-trained model is able to generalize to different noise levels. The proposed optimization algorithm, derived from the square root lasso, is translated into a new, computationally efficient auto-encoding architecture. After proving that our new method is invariant to the noise level, we evaluate our approach by training networks using the proposed architecture for denoising tasks. Our experimental results demonstrate that the trained models yield a significant improvement in stability against varying types of noise compared to commonly used architectures.
more »
« less
A Single-Clock-Phase Sense Amplifier Architecture with 9x Smaller Clock-to-Q Delay Compared to the StrongARM & 6.3dB Lower Noise Compared to Double-Tail
A single-clock-phase sense amplifier architecture with a strong regeneration is proposed. Designed in 22nm FinFET, the proposed architecture has a 9x smaller t CQ delay compared to the conventional StrongARM latch and 6.3dB lower input referred noise compared to the Double-Tail architecture for similar input transistor size and power consumption.
more »
« less
- Award ID(s):
- 2006571
- PAR ID:
- 10355811
- Date Published:
- Journal Name:
- International Symposium on VLSI Technology Systems and Application
- ISSN:
- 2469-3863
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Vertical profiles of temperature and dewpoint are useful in predicting deep convection that leads to severe weather, which threatens property and lives. Currently, forecasters rely on observations from radiosonde launches and numerical weather prediction (NWP) models. Radiosonde observations are, however, temporally and spatially sparse, and NWP models contain inherent errors that influence short-term predictions of high impact events. This work explores using machine learning (ML) to postprocess NWP model forecasts, combining them with satellite data to improve vertical profiles of temperature and dewpoint. We focus on different ML architectures, loss functions, and input features to optimize predictions. Because we are predicting vertical profiles at 256 levels in the atmosphere, this work provides a unique perspective at using ML for 1D tasks. Compared to baseline profiles from the Rapid Refresh (RAP), ML predictions offer the largest improvement for dewpoint, particularly in the middle and upper atmosphere. Temperature improvements are modest, but CAPE values are improved by up to 40%. Feature importance analyses indicate that the ML models are primarily improving incoming RAP biases. While additional model and satellite data offer some improvement to the predictions, architecture choice is more important than feature selection in fine-tuning the results. Our proposed deep residual U-Net performs the best by leveraging spatial context from the input RAP profiles; however, the results are remarkably robust across model architecture. Further, uncertainty estimates for every level are well calibrated and can provide useful information to forecasters.more » « less
-
Fabric draping, which is referred to as the process of forming of textile reinforcements over a 3D mold, is a critical stage in composites manufacturing since it determines the fiber orientation that affects subsequent infusion and curing processes and the resulting structural performance. The goal of this study is to predict the fabric deformation during the draping process and develop in-depth understanding of fabric deformation through an architecture-based discrete Finite Element Analysis (FEA). A new, efficient discrete fabric modeling approach is proposed by representing textile architecture using virtual fiber tows modeled as Timoshenko beams and connected by the springs and dashpots at the intersections of the interlaced tows. Both picture frame and cantilever beam bending tests were carried out to characterize input model parameters. The predictive capability of the proposed modeling approach is demonstrated by predicting the deformation and shear angles of a fabric subject to hemisphere draping. Key deformation modes, including bending and shearing, are successfully captured using the proposed model. The development of the virtual fiber tow model provides an efficient method to illustrate individual tow deformation during draping while achieving computational efficiency in large-scale fabric draping simulations. Discrete fabric architecture and the inter-tow interactions are considered in the proposed model, promoting a deep understanding of fiber tow deformation modes and their contribution to the overall fabric deformation responses.more » « less
-
As the size of real-world graphs continues to grow at an exponential rate, performing the Graph Convolutional Network (GCN) inference efficiently is becoming increasingly challenging. Prior works that employ a unified computing engine with a predefined computation order lack the necessary flexibility and scalability to handle diverse input graph datasets. In this paper, we introduce OPT-GCN, a chiplet-based accelerator design that performs GCN inference efficiently while providing flexibility and scalability through an architecture-algorithm co-design. On the architecture side, the proposed design integrates a unified computing engine in each chiplet and an active interposer, both of which are adaptable to efficiently perform the GCN inference and facilitate data communication. On the algorithm side, we propose dynamic scheduling and mapping algorithms to optimize memory access and on-chip computations for diverse GCN applications. Experimental results show that the proposed design provides a memory access reduction by a factor of 11.3×, 3.4×, 1.4× energy savings of 15.2×, 3.7×, 1.6× on average compared to HyGCN, AWB-GCN, and GCNAX, respectively.more » « less
-
Fully Homomorphic Encryption (FHE) presents a paradigm-shifting framework for performing computations on encrypted data, offering revolutionary implications for privacy-preserving technologies. This paper introduces a novel hardware implementation of scheme switching between two leading FHE schemes targeting different computational needs, i.e., arithmetic HE scheme CKKS, and Boolean HE scheme FHEW. The proposed architecture facilitates dynamic switching between the schemes with improved throughput and latency compared to the software baseline. The proposed architecture computation modules support scheme switching operations involving coefficient conversion, modular switching, and key switching. We also optimize the hardware designs for the pre-processing and post-processing blocks, involving key generation, encryption, and decryption. The effectiveness of our proposed design is verified on the Xilinx U280 Datacenter Acceleration FPGA. We demonstrate that the proposed scheme switching accelerator yields a 365× performance improvement over the software counterpart.more » « less