skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Efficient Open Modification Spectral Library Searching in High-Dimensional Space with Multi-Level-Cell Memory.
Open Modification Search (OMS) is a promising algorithm for mass spectrometry analysis that enables the discovery of modified peptides. However, OMS encounters challenges as it exponentially extends the search scope. Existing OMS accelerators either have limited parallelism or struggle to scale effectively with growing data volumes. In this work, we introduce an OMS accelerator utilizing multi-level-cell (MLC) RRAM memory to enhance storage capacity by 3x. Through in-memory computing, we achieve up to 77x faster data processing with two to three orders of magnitude better energy efficiency. Testing was done on a fabricated MLC RRAM chip. We leverage hyperdimensional computing to tolerate up to 10% memory errors while delivering massive parallelism in hardware.  more » « less
Award ID(s):
2120019
PAR ID:
10534243
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
ArXiv
Date Published:
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Zeroth-order fine-tuning eliminates explicit back-propagation and reduces memory overhead for large language models (LLMs), making it a promising approach for on-device fine-tuning tasks. However, existing memory-centric accelerators fail to fully leverage these benefits due to inefficiencies in balancing bit density, compute-in-memory capability, and endurance-retention trade-off. We present a reliability-aware, analog multi-level-cell (MLC) eDRAM-RRAM compute-in-memory (CIM) solution co-designed with zeroth-order optimization for language model fine-tuning. An RRAM-assisted eDRAM MLC programming scheme is developed, along with a process-voltage-temperature (PVT)-robust, large-sensing-window time-to-digital converter (TDC). The MLC-eDRAM integrating two-finger MOM provides 12× improvement in bit density over state-of-the-art MLC design. Another 5× density and 2× retention benefits are gained by adopting BEOL In2O3 FETs. 
    more » « less
  2. RRAM-based in-memory computing (IMC) effectively accelerates deep neural networks (DNNs) and other machine learning algorithms. On the other hand, in the presence of RRAM device variations and lower precision, the mapping of DNNs to RRAM-based IMC suffers from severe accuracy loss. In this work, we propose a novel hybrid IMC architecture that integrates an RRAM-based IMC macro with a digital SRAM macro using a programmable shifter to compensate for the RRAM variations and recover the accuracy. The digital SRAM macro consists of a small SRAM memory array and an array of multiply-and-accumulate (MAC) units. The non-ideal output from the RRAM macro, due to device and circuit non-idealities, is compensated by adding the precise output from the SRAM macro. In addition, the programmable shifter allows for different scales of compensation by shifting the SRAM macro output relative to the RRAM macro output. On the algorithm side, we develop a framework for the training of DNNs to support the hybrid IMC architecture through ensemble learning. The proposed framework performs quantization (weights and activations), pruning, RRAM IMC-aware training, and employs ensemble learning through different compensation scales by utilizing the programmable shifter. Finally, we design a silicon prototype of the proposed hybrid IMC architecture in the 65nm SUNY process to demonstrate its efficacy. Experimental evaluation of the hybrid IMC architecture shows that the SRAM compensation allows for a realistic IMC architecture with multi-level RRAM cells (MLC) even though they suffer from high variations. The hybrid IMC architecture achieves up to 21.9%, 12.65%, and 6.52% improvement in post-mapping accuracy over state-of-the-art techniques, at minimal overhead, for ResNet-20 on CIFAR-10, VGG-16 on CIFAR-10, and ResNet-18 on ImageNet, respectively. 
    more » « less
  3. Resistive Random Access Memory (RRAM) devices hold promise as a key enabler technology for energy-efficient, in-memory, and brain-inspired computing paradigms, with the potential to significantly enhance high-performance computing applications. However, the widespread adoption of RRAM technology in high-performance computing applications is hindered by non-ideal device metrics and various reliability challenges. RRAM devices are reported to exhibit critical device-to-device (D2D) and cycle-to-cycle (C2C) variability. In this paper, we investigate D2D and C2C variabilities of Tantalum Oxide RRAM devices and explore potentiation, depression, and endurance dynamics under varying operation conditions. Our ultimate goal is to address performance and reliability issues associated with the oxide-based RRAM device technology and facilitate its broader implementation in future computing applications. 
    more » « less
  4. Abstract MotivationDriven by technological advances, the throughput and cost of mass spectrometry (MS) proteomics experiments have improved by orders of magnitude in recent decades. Spectral library searching is a common approach to annotating experimental mass spectra by matching them against large libraries of reference spectra corresponding to known peptides. An important disadvantage, however, is that only peptides included in the spectral library can be found, whereas novel peptides, such as those with unexpected post-translational modifications (PTMs), will remain unknown. Open modification searching (OMS) is an increasingly popular approach to annotate modified peptides based on partial matches against their unmodified counterparts. Unfortunately, this leads to very large search spaces and excessive runtimes, which is especially problematic considering the continuously increasing sizes of MS proteomics datasets. ResultsWe propose an OMS algorithm, called HOMS-TC, that fully exploits parallelism in the entire pipeline of spectral library searching. We designed a new highly parallel encoding method based on the principle of hyperdimensional computing to encode mass spectral data to hypervectors while minimizing information loss. This process can be easily parallelized since each dimension is calculated independently. HOMS-TC processes two stages of existing cascade search in parallel and selects the most similar spectra while considering PTMs. We accelerate HOMS-TC on NVIDIA’s tensor core units, which is emerging and readily available in the recent graphics processing unit (GPU). Our evaluation shows that HOMS-TC is 31× faster on average than alternative search engines and provides comparable accuracy to competing search tools. Availability and implementationHOMS-TC is freely available under the Apache 2.0 license as an open-source software project at https://github.com/tycheyoung/homs-tc. 
    more » « less
  5. Hafnium-oxide based bipolar RRAM was investigated for high-level temporal correlation detection, for in-memory computing. The experimental analog data of HfO2 RRAM, both in RESET and SET regimes was evaluated to detect 10 correlated processes from 25 processes on a 5x5 RRAM array. Our method gave 36,000-53,000 times less energy consumption than that of a previous implementation with phase change memory, and a predicted acceleration of 1600-2100 times the execution time than that of 1xPOWER8 CPU (1 thread) for detecting correlation between 25 processes. 
    more » « less