<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Advancements in Content-Addressable Memory (CAM) Circuits: State-of-the-Art, Applications, and Future Directions in the AI Domain</title></titleStmt>
			<publicationStmt>
				<publisher>IEEE</publisher>
				<date>01/01/2025</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10590969</idno>
					<idno type="doi">10.1109/TCSI.2025.3527309</idno>
					<title level='j'>IEEE Transactions on Circuits and Systems I: Regular Papers</title>
<idno>1549-8328</idno>
<biblScope unit="volume"></biblScope>
<biblScope unit="issue"></biblScope>					

					<author>Tergel Molom-Ochir</author><author>Brady Taylor</author><author>Hai Li</author><author>Yiran Chen</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Content-Addressable Memory (CAM) circuits, distinguished by their ability to accelerate data retrieval through a direct content-matching function, are increasingly crucial in the era of AI and increasing data computation. With the rise of AI models, hardware matching and hashing capabilities become essential, underscoring the need for a comprehensive survey of this evolving technology. This survey explores various CAM types across circuit designs and technologies, highlighting contributions to fields such as Machine Learning and genomics. We review 37 CAM cell designs, focusing on emerging trends in area and energy efficiency, pivotal for next-generation computing. Furthermore, we discuss current challenges and suggest future research directions in CAM technology.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>queries efficiently. Analog CAM arrays store data as ranges of acceptable values, with analog inputs provided for matching. If an input value falls within the stored range of a cell, it is considered a match for that cell. This capability is useful for applications that require the matching of continuous or multilevel data values <ref type="bibr">[37]</ref>. Lastly, differentiable CAMs handle all analog inputs, storage, and outputs, offering greater flexibility in search operations. Unlike standard analog CAMs, differentiable CAMs provide an analog output that indicates the degree of match between the input and the stored data <ref type="bibr">[38]</ref>. This allows closest match search between analog input and stored values.</p><p>The associative or parallel search mechanism makes CAMs extremely fast at searching through large datasets, which is particularly beneficial for functions such as matching and hashing, where a CAM is used to search and decide whether a certain pattern exists in a large table of data. Binary and Ternary CAMs have been applied to data intensive genomics computations <ref type="bibr">[5]</ref>, <ref type="bibr">[6]</ref>, <ref type="bibr">[7]</ref>, <ref type="bibr">[8]</ref>, <ref type="bibr">[9]</ref>, <ref type="bibr">[10]</ref>, <ref type="bibr">[11]</ref>, <ref type="bibr">[12]</ref>, hamming distance calculations <ref type="bibr">[39]</ref>, and hashing <ref type="bibr">[2]</ref>, <ref type="bibr">[15]</ref>, <ref type="bibr">[16]</ref>, <ref type="bibr">[18]</ref>, <ref type="bibr">[19]</ref>, to name a few. Moreoever, Analog CAM's unique search mechanism were applied to Machine Learning model mapping and acceleration <ref type="bibr">[37]</ref>.</p><p>In big data and AI applications where vast amounts of data need to be accessed and processed rapidly, such as those found in computer vision and natural language processing, in-memory computing solutions are emerging to address the 'memory wall' challenge in AI hardware trends, where the speed of CPUs and GPUs is significantly obstructed by the latency and bandwidth limitations of traditional memory hierarchies <ref type="bibr">[40]</ref>. CAM's ability to perform parallel searches in hardware significantly enhances the performance of systems that require fast data search and processing in-memory. In a CAM, the memory array is designed to broadcast input data to all rows of stored data and compare them simultaneously. Unlike sequentially checking each entry, CAM's parallel search can locate the matching entry in a single operation, achieving constant time complexity O(1) <ref type="bibr">[22]</ref>, <ref type="bibr">[23]</ref>, <ref type="bibr">[24]</ref>, <ref type="bibr">[25]</ref>. This reduces latency and maximizes bandwidth utilization by minimizing the need for data to transfer back and forth between the memory and the processing units, overcoming the memory wall.</p><p>The realm of CAM technology is rapidly advancing, and a comprehensive survey is needed to understand the current advancements in the age of AI, and future directions. This survey aims to fill the gap by providing an in-depth analysis of the latest developments in CAM technologies, and applications in the AI landscape.</p><p>In this paper, we classify CAM technologies into several categories based on their operational mechanisms and underlying memory technologies. Section II will delve into CAM circuits, detailing different types of CAMs, as well as semiconductor and emerging non-volatile memory (NVM) technologies used for CAM implementation. Section III will provide a comprehensive analysis on emerging applications of CAM. Lastly, section IV discusses challenges, potential research directions, and anticipated advancements in CAM technology.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>II. CAM CIRCUITS</head><p>Unlike standard memories such as SRAMs, which are accessed by a specific memory address, CAMs allow data retrieval based on the content itself. In other words, CAM operates on search data-in and address-out, while standard memories operate on address-in and data-out principle.</p><p>The core cell, which compares a single bit of stored and query data, is the fundamental unit of any CAM array. CAM cells are connected vertically via bit lines, with a driver controlling the input, and horizontally via matchlines, where stored bit patterns reside. Each matchline has a sense amplifier to finalize the readout, and an encoder translates the sense amplifier outputs to the binary address of the matched matchline in case of a single best match. Most CAM arrays support exact matches, requiring all cells on a matchline to match. The basic structure of a CAM cell consists of two parts: storage circuitry and compare circuitry. The comparison operation in CAM cell involves two phases: precharge and evaluation. Despite numerous proposed and implemented core cell designs and reviews of existing designs, no comprehensive survey of recent CAM cell developments and emerging AI applications exists. This paper presents 39 different CAM core cells, shown in Table <ref type="table">V</ref>, and discusses their applications in the age of AI.</p><p>In this section, we delve into the various circuits utilized in CAM technology. Different circuit designs provide various functionalities and performance characteristics, meeting diverse application needs. CAMs are classified by their underlying concepts, and the technologies used for their implementation. This section explores several types of CAMs, each with unique advantages and limitations, and technologies used for their construction. A. Matchline Architecture CAM architectures can be broadly classified into two types: NOR and NAND, each with distinct characteristics as shown in Fig. <ref type="figure">2</ref>.</p><p>In NOR-type architecture, each cell has a pull-down transistor connected to the match line. The match line is precharged, and during a search operation, the cell compares its stored value with the input value. If the values differ, the transistor discharges the match line via the pull-down transistor, indicating a mismatch. If all cells match, the pull-down transistor is not activated and the match line remains high. The NOR architecture enables simultaneous comparisons, making it fast. However, it consumes high power as it precharges and discharges multiple match lines with each search operation.</p><p>The NAND-type architecture operates using pass transistors. Each cell has a pass transistor connected to the match line, and the match signal propagates sequentially through these transistors. When a match occurs, the pass transistors are turned on, allowing the signal to pass to the sense amplifier. If there is a mismatch, the transistor remains off, blocking the signal. This architecture is power-efficient since it avoids precharging and discharging the match lines repeatedly. However, the sequential nature of the matching process makes it slower compared to the NOR architecture.</p><p>NOR-type is faster due to simultaneous comparisons. NAND-type is more power-efficient as it avoids continuous precharging and discharging. In summary, NOR-type architecture is suited for high-speed applications but at the cost of higher power consumption, whereas NAND-type architecture offers lower power usage but slower operation <ref type="bibr">[36]</ref>, <ref type="bibr">[41]</ref>, <ref type="bibr">[42]</ref>. Balancing these characteristics is key to optimizing CAM performance for specific applications. Systems that combine the best of two matchline architectures has been developed with the goal to achieve high-speed, low-power CAM systems <ref type="bibr">[28]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Architecture and Peripherals</head><p>Fig. <ref type="figure">3</ref> depicts the basic structure for a CAM system, necessary for the execution of high-speed search operations. Integral elements in this include pre-charge circuitry, precharging match-lines before each search; a query register that will store and broadcast the input search word to the searchlines; match line sense amplifiers (MLSAs) that will sense a match or a mismatch; and an encoder to convert the match results into a binary address. Depending on the application and architectural requirements, MLSAs can be designed to output  exact match/mismatch or Hamming distances. In addition, address decoders and write drivers enable writing and updating stored data, hence the system is programmable. Other applications may require more peripherals to make better functionality. Examples of such are multiple match resolvers used in networking to handle simultaneous matches, priority encoders to determine the highest-priority match for routing application, and segmented matchlines to optimize power in energy-sensitive designs. In simpler read-only CAM systems, the write drivers and address decoders can be removed to further simplify the design. This modularity allows the architecture to be optimized for particular performance and power requirements.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C. CAM Cell Concepts</head><p>Depending on the the specific needs and limitations of the application, different CAM circuits have been realized over the years Fig. <ref type="figure">4</ref>. Predominantly implemented using SRAM technology, with digital CAMs, the inputs, storage, and output are represented as binary values (0 and 1), while digital ternary storage adds a "don't care" state (X) for flexibility. Recently, analog CAM concept was introduced <ref type="bibr">[37]</ref>; analog CAMs allow input and storage values ranging from 0.0 to 1.0, facilitating multilevel matching, with analog ternary storage also including the X state.</p><p>Differentiable CAMs operate fully in analog, allowing unique output types through differentiable functions. Digital CAMs are discussed in depth in Section. II-C.1, Analog CAMs in Section. II-C.2, and Differentiable CAMs in Section. II-C.3. Various types of existing CAM concepts are shown in Table I. 1) Digital CAMs: CAMs are specialized devices designed for high-speed search operations. These devices are implemented using SRAM and DRAM technologies and function entirely with digital inputs, storage, and outputs. At the cell</p><p>TABLE I TYPES OF CAM CONCEPTS</p><p>TABLE II CELL DESIGNS FOR BINARY AND TERNARY CAM USING SRAM (10T BINARY, 16T TERNARY) AND DRAM (3T1C BINARY, 6T2C TERNARY) TECHNOLOGIES <ref type="bibr">[31]</ref>, <ref type="bibr">[43]</ref>, <ref type="bibr">[44]</ref> level, Binary CAM cells perform an XNOR operation between the stored bit and the search bit. If the stored and search bits are the same (either both 0 or both 1), the cell outputs a "1" (match); otherwise, it outputs a "0" (mismatch). On a row level, if all cells in a row match the input data, the row outputs a "1" on the matchline, and the corresponding memory address is returned. For instance, as shown in Fig. <ref type="figure">1b</ref>, if the input data is 110010, the CAM will compare this input with all rows in parallel. If the third row matches the input data, the CAM outputs a "1" for this row and returns the address "2" via the encoder. Ternary CAMs (TCAMs) operate similarly to binary CAMs but include a third state, "don't care" (X), which can match any input bit. For example, a TCAM row storing 10 &#215; 1 can match input values 1011 and 1001. TCAMs provide straightforward match or mismatch signals.</p><p>While digital CAMs are advantageous for their simple search capabilities, they often suffer from large physical sizes due to the complexity of their digital circuitry. Table <ref type="table">II</ref> show example binary and ternary SRAM implmementation based on the 6T SRAM cell.</p><p>2) Analog CAMs: Analog CAMs operate with analog inputs and storage, enabling the processing and storage of a wide range of continuous data values <ref type="bibr">[37]</ref>. Typically realized using emerging non-volatile memory (NVM) technologies, they are valuable for applications requiring precise data control. Despite using analog inputs and storage, the output is digital (match or mismatch). This hybrid approach can increase storage density and design efficiency, but may introduce noiserelated challenges. This design is ideal for multilevel and non-binary state matching.</p><p>Analog CAMs store data as ranges between 0 and 1. When an analog input is provided, it is compared to these ranges. If the input falls within a stored range, it is considered a "1" (match). For example, if a cell stores a range between 0.36 and 0.75 and the input is 0.60, the cell will output a match. If the input is outside this range, a "0" (mismatch) is returned. When all cells on a matchline output a match, the address of that matchline is returned.</p><p>3) Differentiable CAMs: Differentiable CAMs, which operate fully in analog, store data as ranges and receive analog inputs, providing an analog output that indicates the degree of match for each row rather than a simple "True" or "False" result as can be seen in Fig. <ref type="figure">5</ref>. If a cell stores a range between 0.36 and 0.72, an input of 0.24, being closer to the stored range, will cause a smaller current in aMLlo and a slower discharge of the matchline, resulting in a higher analog output (weak mismatch) compared to an input of 0.12 (strong mismatch). For any value within the range, i.e. 0.48, the ML stays charged, indicating a match. This all-analog implementation links CAMs to analog crossbar arrays, enabling fast searches to determine the degree of match between tables of analog values and analog inputs, thus broadening the scope of applications that can perform similar searches in analog.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>D. Semiconductor Technologies for CAM 1) SRAM:</head><p>The storage circuitry is usually a SRAM cell, which is often implemented using six-transistors (6T) <ref type="bibr">[29]</ref>, <ref type="bibr">[31]</ref>, <ref type="bibr">[34]</ref>, <ref type="bibr">[35]</ref>, <ref type="bibr">[41]</ref>, <ref type="bibr">[45]</ref>, <ref type="bibr">[46]</ref>, <ref type="bibr">[47]</ref>. The 6T cell comprises two cross-coupled inverters and two access transistors. Compare circuitry includes transistors that connect to the matchline and search lines. These transistors are responsible for comparing the stored bit with the search bit. Table <ref type="table">II</ref> shows example binary and ternary SRAM cell designs.</p><p>This structure allows stable and low power data storage without the need for periodic refreshing, making it ideal for applications that require high-speed memory. This structure is challenged by scaling issues, resulting in higher production costs and larger cell sizes.</p><p>The two cross-coupled inverters create a stable storage element that holds a single bit of data. The remaining two nMOS transistors act as access transistors controlled by the word line. When writing data, the word line is activated, allowing the data bit to be written into the storage nodes through the bit lines. The data is stored at the intersection of the two cross-coupled inverters, providing a stable state as long as power is supplied. During a read, the word line is activated again, allowing the stored data to be read out through the bit lines.</p><p>A typical comparison goes as follows in a NOR-type CAM. The match line is precharged to a high voltage (logical 1).</p><p>The search lines carry the search data. If any bit in the stored data does not match the search data, the corresponding transistor will pull the match line to a low voltage (logical 0). Thus, the match line remains high only if all bits match. When a mismatch is detected, the match line discharges through the nMOS transistors corresponding to the mismatched bits, pulling it low.</p><p>SRAM-CAMs are fast and have high endurance. However, they are volatile, have higher power consumption and suffer from large area. SRAM-CAMs are best for high-speed applications, such as networking routers and switches, and high-speed caches.</p><p>2) DRAM: Dynamic Random Access Memory (DRAM) based CAMs are characterized by their single transistor and capacitor configuration, which supports a compact and highdensity design. This structure, while economical and capable of achieving higher cell density than SRAM-based CAMs, necessitates frequent refresh cycles due to charge leakage from the capacitors. This inherent volatility impacts the overall speed of memory access. Table <ref type="table">II</ref> shows example binary and ternary DRAM cell designs.</p><p>Data in DRAM-based CAMs is stored as an electrical charge in a capacitor, indicative of binary data (1s and 0s). Each cell includes an access transistor, controlled by a word line, that regulates whether the capacitor charges (to store a '1') or discharges (to store a '0'). The need for periodic refresh cycles to replenish charge leakage is crucial for maintaining data integrity.</p><p>The comparison mechanism within DRAM-based CAM cells employs two key transistors that link the match line (ML) to ground. These transistors represent the stored bit and the inverse of the search bit. During a search operation, if the stored data matches the input query, only one transistor activates, preventing the ML from discharging. Conversely, a mismatch activates both transistors, discharging the ML through a direct path to ground. The state of the MLeither holding its precharged level in the event of a match or discharging in the case of a mismatch-is detected by match line sense amplifiers to confirm the presence or absence of a match. During a match, ideally, no current flows through the match line, and it remains at its precharged level. If there's a mismatch, the access transistor allows current to flow from the match line to ground, pulling the voltage down.</p><p>DRAM-based CAMs are ideal for large-scale memory applications where cost and density are prioritized over speed. Their use is particularly advantageous in fields requiring substantial memory resources, such as database acceleration, machine learning inference, and big data analysis. Despite their higher density and cost-effectiveness, the slower access times and the need for regular refreshes due to volatility should be considered when choosing memory solutions for high-speed applications. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>E. Emerging Non-Volatile Memory (NVM) Technologies</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>TABLE III</head><p>CELL DESIGNS FOR CAM USING EMERGING NON-VOLATILE TECHNOLOGIES: TERNARY ReRAM (2T2M) <ref type="bibr">[48]</ref>, TERNARY MTJ (4T-2MTJ) <ref type="bibr">[49]</ref>, TERNARY FeFET (4T-2FeFET) <ref type="bibr">[50]</ref>,</p><p>ANALOG ReRAM (6T2M) <ref type="bibr">[37]</ref>, AND DIFFERENTIABLE (6T2M) <ref type="bibr">[38]</ref> (two transistors and two ReRAMs) or 3T1R (three transistors and one ReRAM) for ternary CAMs and 6T2M (six transistors and two memristors) for Analog and differentiable CAMs. The transistors are used for access control and signal amplification, while the memristors are employed for data storage due to their resistive switching capabilities. For example, in a 2T2R cell design: Two transistors (T1 and T2) are used to control the read and write access to the cell. Two ReRAM devices (R1 and R2) are used to store the binary data, leveraging their high resistance (HRS) and low resistance (LRS) states to represent logic '1' and '0'. Data in ReRAM-based CAM cells is stored in the resistive states of the memristors. Each memristor can switch between a high resistance state (HRS) and a low resistance state (LRS), representing binary data. During the write operation, a voltage is applied across the ReRAM device to change its resistance state. For example, applying a higher voltage might set the device to LRS (logic '0'), while applying a lower voltage or reverse polarity might set it to HRS (logic '1'). The resistance state of the ReRAM is non-volatile, meaning it retains its state even when the power is turned off, thus storing the data persistently. ReRAM cells are smaller, allowing for higher density memory arrays. They are non-volatile; ReRAM retains data without power, unlike SRAM and DRAM which require constant power. Moreover, ReRAM-based designs generally consume less power, especially in idle and search operations.</p><p>For digital binary or ternary CAMs, in a 2T2R cell shown in Table <ref type="table">III</ref>, the current through the cell during a read or search operation is determined by the combined resistance of R1 and R2. For a match (both memristors in the same state), the current flow will be as expected (either high if both are LRS or low if both are HRS). For a mismatch (memristors in different states), the current will be different from the expected value (one high and one low resistance, resulting in an intermediate current).</p><p>In Analog CAMs, implemented as 6T2M <ref type="bibr">[37]</ref> shown in Table <ref type="table">III</ref>, which checks a range of values, each side of the cell evaluates a 'greater than' or 'less than' condition. When an input voltage is applied to the Data Line (DL), the conductance through transistors T1 and T3 is compared to the stored conductances on memristors M1 and M2, respectively. If the conductance through T1 is greater than that of M1, transistor T2 remains off, preventing the Match Line (ML) from being pulled down, indicating DL &gt; M1. Conversely, if DL &lt; M1, there is a conductive path from the Source Line high (SLhi) to T2, turning T2 on and pulling ML down to ground, indicating a mismatch.</p><p>The circuit has been modified to operate as a differentiable CAM (dCAM) by adding the ability to sense the discharge current on aMLlo, representing a distance metric between the input and stored data. In the former case, the current changes as aMLhi discharges but is generally assumed constant while T2 and T6 are in saturation. In the latter case, the current is constant and depends on the voltage on aDL and the stored conductance. If the input is close to the stored values, a small current flows in aMLlo, whereas a significant difference results in an increased current, rapidly discharging aMLhi. An example cell design is shown in Table <ref type="table">III</ref>.</p><p>Further, Khan and Rashid <ref type="bibr">[48]</ref> discusses a hybrid ternary CAM using memristors to minimize area and wiring complexity. Bazzi et al. <ref type="bibr">[51]</ref> introduced new analog CAM cell designs using memristors with an emphasis on the gain of cell parts. This combination of memristors and transistors introduces a more compact, power-efficient, reliable, and high-performance memory architectures, which is highly sought after in big data and IoT.</p><p>ReRAM-CAMs are non-volatile, high density, and low power but are limited in terms of endurance, and has variability in resistance states. They are best suited for energy-efficient applications such as machine learning models acceleration, and edge computing.</p><p>2) Magnetoresistive RAM (MRAM): The MTJ based nonvolatile ternary content-addressable memory (NV-TCAM) cell consists of transistors and magnetic tunnel junctions (MTJs). Table <ref type="table">III</ref> shows an example of a 2MTJ-4T design. The MTJs act as resistors with two possible states based on their magnetization: parallel alignment (low resistance, RL) and antiparallel alignment (high resistance, RH). The transistors include NMOS transistors for connecting the MTJs to the search lines (SL and SL), and a PMOS transistor functioning as a voltage keeper to stabilize the match line (ML) voltage during the evaluation phase. Additionally, an NMOS transistor connected to the ML acts as a diode to control the discharge path.</p><p>Data in the NV-TCAM cell is stored in the MTJs based on their resistance states. For a binary '0', R1 is set to RH and R2 to RL. For a binary '1', R1 is set to RL and R2 to RH. If the data is 'don't care' (X), both R1 and R2 are set to RH. The resistance state of each MTJ is determined by the alignment of the magnetizations in its two ferromagnetic layers, which can be altered by applying a specific current through the MTJ.</p><p>During the precharge phase, SL and SL are grounded, and the ML voltage (VML) is precharged to VDD using an external precharge transistor. In the evaluation phase, SL and SL are set to opposite voltages depending on the search data (VDD and GND or vice versa). If the stored data matches the search data, the connected MTJ remains in the high resistance state (RH), resulting in a high D-node voltage (V @D H ). VML discharges from VDD to V @D H + V T H K eeper (the threshold voltage of the voltage keeper), cutting off the voltage keeper. If there is a mismatch, the connected MTJ is in the low resistance state (RL), resulting in a low D-node voltage (V @D L). VML discharges to V @D L + V T H K eeper . The match line (ML) sense amplifier detects the voltage difference ( VML) to determine if the TCAM word matches the search data.</p><p>MTJ-CAMs are non-volatile, high-speed, and have high endurance. These advantages come at a high cost and fabrication complexity. They are best suited for high-speed non-volatile memory applications as they combine speed and non-volatility.</p><p>3) Ferroelectric Field-Effect Transistor (FeFET): A ferroelectric field-effect transistor (FeFET) incorporates a ferroelectric material into its gate dielectric. This material exhibits unique properties, allowing it to maintain a polarization state even without a power supply. A ternary FeFET CAM cell typically consists of two FeFETs and four transistors, shown in Table <ref type="table">III</ref>.</p><p>Data storage in FeFET-based CAM cells is achieved through the polarization states of the ferroelectric material. When a voltage is applied to the gate, it polarizes the ferroelectric layer, writing binary data (0 or 1) based on the direction of the polarization. This polarization remains stable even when the power is turned off, ensuring non-volatile data storage. In a 2FeFET TCAM, each cell uses two parallel FeFETs connected to a matchline (ML) and sourceline (ScL). During a search operation, a specific voltage is applied to the gate of each FeFET in the CAM array. The current response of the FeFET indicates whether the stored polarization state matches the input data. If the stored state matches the input data, the current does not flow from ML to GND, signifying a match.</p><p>A logic '1' is written by applying V_write to the gate (BL/SL) and GND to the source (ScL), while a logic '0' is written by reversing these voltages. The don't care state (X) is stored by writing logic '0' into both FeFETs, allowing both transistors to hold '0'. During a search operation, the matchline (ML) is precharged high, and search voltages (V_search) are applied to the gates (SL/SL) according to the input data-V_search for logic '1' and 0 for logic '0'. The inputs to the transistors (SL and SL) and the stored states (S and S) determine whether the pull-down paths are ON or OFF. If there is a match, both pull-down paths remain OFF, keeping the ML high. In the event of a mismatch, at least one pull-down path is ON, discharging the ML. This design ensures efficient comparison operations with the current flow indicating the match or mismatch, while the don't care state keeps the ML high regardless of the input <ref type="bibr">[50]</ref>, <ref type="bibr">[65]</ref>.</p><p>Leveraging the multilevel-cell states in FeFETs, a 2FeFETbased CAM design, shown in III, can store continuous analog values by setting upper and lower bounds using two FeFETs connected to an inverted searchline (SL). Each FeFET defines the bounds for matching the input voltage (V S L ).</p><p>TABLE IV COMPARISON OF CAM TECHNOLOGIES</p><p>During a search, the ML is precharged, and if V S L falls within the stored range (between the upper and lower bounds set by the FeFETs' threshold voltages), the ML remains high, indicating a match. If V S L is outside this range, one of the FeFETs turns on and discharges the ML, indicating no match. This configuration allows for flexible and efficient range-based searching and matching, suitable for applications requiring continuous range storage and multi-bit quantized searches <ref type="bibr">[66]</ref>.</p><p>Their fast switching capabilities make them ideal for high-speed search operations, while their non-volatile nature ensures data retention without power. However, they have limited endurance due to the wear on the ferroelectric material and involve complex materials and fabrication processes. FeFET-CAMs are well-suited for low-power and high-speed applications, such as genomic data processing and real-time data analytics, in-memory computing, and memory-augmented neural networks <ref type="bibr">[50]</ref>, where FeFET-based TCAMs can drastically reduce energy use and latency.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>F. Comparison of CAM Technologies</head><p>As shown in Table <ref type="table">IV</ref>, comparison among SRAM, DRAM, and NVM CAM designs shows each technology has different strengths and trade-offs. SRAM-based CAMs are highest in terms of speed and endurance and thus seem well suited for high-end applications, including networking; however, they have very high power consumption and area footprint and, therefore, are not scalable. On the other hand, DRAM-based CAMs have a high density and low cost, enabling compact solutions for large memory-hungry applications like machine learning inference but are inherently volatile, with potentially lower access times; otherwise, NVM CAMs, including ReRAM, MRAM, and FeFET-based architectures, are characterized by non-volatility, energy efficiency, and compact form factor. Each NVM type has its unique strengths: ReRAM has high density and low power but faces endurance challenges; MRAM is high-speed and offers good endurance at a higher cost; FeFET allows fast multi-state storage but suffers from fabrication complexity and poor endurance. The final choice of CAM technology will depend on the specific application requirements of speed, power efficiency, cost, and memory density.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>III. EMERGING APPLICATIONS</head><p>In the space of memory technology, the fast-growing development and transformation of CAM has been characterized</p><p>This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.</p><p>Authorized licensed use limited to: University of Texas at San Antonio. Downloaded on May 16,2025 at 22:43:34 UTC from IEEE Xplore. Restrictions apply.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>TABLE V</head><p>CAM CELL DESIGNS by significant milestones. In the early 1990s, the application of CAM started with associative memory <ref type="bibr">[76]</ref> and processing <ref type="bibr">[77]</ref> and image processing <ref type="bibr">[20]</ref>, <ref type="bibr">[21]</ref>. With the recent trend in processor-memory gap and the rise of the AI models, in-memory computing has become a promising direction and CAMs have shown potential solutions in overcoming the "memory wall." From 2020 onwards, CAMs developments were powered by cutting-edge emerging developments such as ferroelectric devices and compute-in-memory arrays, exploiting CAM's core fundamental operational principles for AI hardware and data processing at a large scale <ref type="bibr">[5]</ref>, <ref type="bibr">[22]</ref>, <ref type="bibr">[23]</ref>, <ref type="bibr">[24]</ref>, <ref type="bibr">[25]</ref>, <ref type="bibr">[26]</ref>, <ref type="bibr">[27]</ref>, <ref type="bibr">[37]</ref>, <ref type="bibr">[50]</ref>, <ref type="bibr">[65]</ref>, <ref type="bibr">[66]</ref>, <ref type="bibr">[67]</ref>. Fig. <ref type="figure">8</ref> shows the CAM application trends over the last ten years.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Machine Learning</head><p>Recent advancements in tree-based machine learning models and analog CAMs have demonstrated significant potential for in-memory acceleration. Notably, memristive devices have been utilized to build analog CAMs that accelerate these models <ref type="bibr">[24]</ref>, and further developments have enabled the acceleration of Deep Random Forests on CAMs <ref type="bibr">[75]</ref>. In the realm of neural networks, redundant analog-to-digital conversions in RRAM-based CNN accelerators were addressed by BRAHMS, a hybrid analog RAM and CAM system that enhances performance and energy efficiency <ref type="bibr">[23]</ref>. Efficient NN acceleration on GPGPU was achieved by storing important features on CAM <ref type="bibr">[71]</ref>, while another study introduced a CAM-based binarized neural network accelerator using time-domain signal processing <ref type="bibr">[72]</ref>. Additionally, Ferroelectric ternary CAM was used for one-shot learning via Memory Augmented Neural Network <ref type="bibr">[67]</ref>. In transformer networks, CAM-based process-in-memory techniques have been integrated with novel attention mechanisms to overcome computational and memory bandwidth bottlenecks. iMCAT, an architecture combining crossbars and CAMs for Transformer network acceleration, utilized locality-sensitive hashing to filter sequence elements by importance <ref type="bibr">[26]</ref>. Furthermore, iMTransformer <ref type="bibr">[27]</ref> and RACE-IT, a Reconfigurable Analog CAM-crossbar Engine, have been proposed to accelerate in-memory Transformer operations, with RACE-IT enabling efficient analog execution of various non-MVM operations within Transformer models <ref type="bibr">[25]</ref>. TABLE VI PERFORMANCE IMPROVEMENTS OVER SOTA ACROSS CAM-BASED APPLICATIONS</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Genomics</head><p>CAM's unique matching capabilities have significantly advanced genomic data processing, enhancing speed and efficiency. In 2020, PARC, a Processing-in-Memory architecture utilizing ReRAM-based CAM, was introduced to target the computationally intensive chaining step in DNA alignment <ref type="bibr">[7]</ref>. This step, which involves ordering and aligning sequences based on similarity, is computationally demanding due to the large amounts of genomic data involved. In 2022, BioSEAL further advanced CAM applications in genomics, aiming to accelerate biological sequence alignment broadly <ref type="bibr">[10]</ref>. In 2023, DASH-CAM, a dynamic storagebased CAM system for pathogen classification, highlighted the dynamic storage capabilities of CAM <ref type="bibr">[5]</ref>. Additionally, ASMCap, employing capacitive multi-level CAM for approximate string matching in genomic sequence analysis, explored the potentials of non-ReRAM based CAMs <ref type="bibr">[8]</ref>. These developments contribute to the unique applications of CAM technologies in data processing-intensive biological research and medical diagnostics.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C. Hashing and Similarity Searches</head><p>CAM's capabilities for direct data comparison and retrieval within the memory hardware itself makes it feasible to do similarity search calculations. Hamming distance calculations were performed in <ref type="bibr">[16]</ref> and <ref type="bibr">[18]</ref>, streamlining the process of searching and matching patterns within the memory. Nearest neighbor searches were performed in-memory in <ref type="bibr">[2]</ref> and <ref type="bibr">[15]</ref> using TCAM and FeFET-based multi-bit CAMs, respectively. Moreover, allowing efficient processing of high-dimensional data, hashing is performed on the chip using CAMs in <ref type="bibr">[19]</ref>. D. Specialized CAM Technologies 1) Optical CAM: Optical Content-Addressable Memory (OpticalCAM) enhances traditional CAMs with advanced photonic circuits, using light for search operations across memory entries. Compared to electronic CAMs, Optical-CAMs are significantly faster and more energy-efficient. In OpticalCAMs, search data encoded as light signals interact with stored optical data within the cell's memory structure. Match detection is performed using XOR functions implemented with semiconductor optical amplifiers (SOAs) and Mach-Zehnder Interferometers (MZIs), which determine if the search data matches the stored data. Data writing involves changing the optical state of the storage mechanism using SOA-MZI flip-flops, while reading data involves sending a probe light and measuring the output with photodetectors. Developments in optical CAM and RAM systems have achieved error-free 10 Gb/s operations using SOA-MZI-based optical flip-flops <ref type="bibr">[15]</ref>. Additionally, address bit levels were increased to 2-bit, and all-optical CAM systems were further developed <ref type="bibr">[17]</ref>, <ref type="bibr">[19]</ref>. Recently, ternary CAMs using optical multiplexing techniques have achieved speeds up to 10 Gb/s <ref type="bibr">[18]</ref>.</p><p>2) Quantum CAM: Quantum-dot Cellular Automata (QCA) utilizes electron positioning within quantum dots to represent binary information, offering a high-speed, low-power alternative to traditional CMOS technology. QCA-based CAM cells are highly efficient for nanoscale data storage and retrieval, with data stored by the spatial configuration of electrons in a cell, where binary states are determined by electron positions <ref type="bibr">[78]</ref>. The search operation involves initializing the cells during a precharge period, followed by comparing the input data to the stored data along a matchline using QCA gates like the majority and minority gates. These gates check if the input electron's position aligns with the stored configuration, signaling a match if they do, otherwise, no signal is sent <ref type="bibr">[79]</ref>. The architecture includes arrays of QCA cells and gates for individual addressing and comparison. Notable achievements of QCA technology include operational speeds in the nanosecond range and an area throughput of 0.14 &#181;m 2 per cell <ref type="bibr">[79]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>IV. CHALLENGES AND OUTLOOK A. Challenges</head><p>CAMs face reliability challenges due to susceptibility to errors, affecting the accuracy and efficiency of CAM operations. These errors can cause incorrect data retrieval and increased latency, compromising the performance of systems that depend on data access and processing.</p><p>The challenge of maintaining accuracy in CAMs is further exacerbated by the continuous down scaling of technology nodes, making them vulnerable to soft errors caused by external electromagnetic radiation and internal voltage fluctuations and noise <ref type="bibr">[80]</ref>. With an exact search function, where all cells on a row have to output a match to yield a row match, as the number of cells in a row increases, the probability of encountering an error rises, necessitating an error detection and correction mechanisms.</p><p>To address this challenge, researchers have been developing various error detection and correction schemes. Pontarelli et al. <ref type="bibr">[81]</ref> proposed an error correction method based on the CAM/RAM system that does not alter the    CAM's internal structure. Varada and Agrawal <ref type="bibr">[82]</ref> introduced a power-efficient TCAM architecture where the traditional priority encoder is replaced with multiplexers and a 2D parity technique is used for multi-bit detection and correction. Moreover, although Analog CAMs enable powerful capabilities such as acceleration of machine learning tasks and nonlinear activation functions, they also come with issues of error and reliability as they depend on memristors. Roth et al. <ref type="bibr">[83]</ref> developed a technique to overcome the reliability issues by introducing coding schemes with minor additions to the hardware. These advancements highlight the ongoing efforts to overcome the reliability challenges in CAMs.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Directions</head><p>From the current trends in CAM technologies, the following areas are worth particular emphasis for future research:</p><p>(1) As depicted by the trajectory of CAM cell area reduction (Fig. <ref type="figure">6</ref>) and information density increase (Fig. <ref type="figure">7</ref>), we anticipate this trend to continue as technology nodes and cell designs advance every few years. Notably, the adoption of memristor devices is expected to grow due to designer preference for lower area density, nonvolatility, and analog capabilities. (2) In the era of AI, CAM technologies are undergoing a significant transformation. Traditionally used as look-up tables, CAMs are now demonstrating how their matching capabilities can be used for Machine Learning tasks. Different types of CAM concepts, as illustrated in Table <ref type="table">I</ref>, with varied search functions, will enable in-memory acceleration of operations within next-generation AI models. This highlights further exploration of CAM designs tailored especially for AI workloads. Table <ref type="table">IV</ref> shows that the most beneficial application of AI acceleration is in the key areas of natural language processing, computer vision, and bioinformatics. (3) Prior works have shown that the parallel processing capabilities of CAMs have high potentials to accelerate increasingly data and computations-intensive genomics sequencing on hardware. The exploration of CAMs for genomics applications is anticipated to grow with the expanding computational biology market, driven by the need for efficient and high-speed data processing. To this end, the extensions of different types of CAMs for genomic processing purposes and applications might be a meaningful future direction. (4) Development of different CAM concepts, each with distinct input, storage, and output types, will enable hardware implementataton of diverse search functions through circuit-level innovation. Catering to a wide range of applications, this approach will enable more flexible search capabilities, such as best match, threshold match, and partial match. Circuit level innovation of various search functions will be an significant topic for future study.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>V. CONCLUSION</head><p>This survey presents CAM circuits as a transformative technology in the semiconductor memory landscape. We reviewed various types of CAMs, including digital, analog, and differentiable CAMs, as well as their underlying technologies such as SRAM, DRAM, ReRAM, MRAM, and FeFET. CAMs have demonstrated their potential in enhancing and accelerating traditionally computationally expensive tasks such as machine learning algorithms, genomics data analysis and hashing. Future research should focus on developing various CAM concepts and search functions, efficient error correction schemes, integrating CAMs with emerging AI models, and exploring new applications in computational biology. The ongoing advancements in CAM technology are poised to address the computational demands of AI and computation intensive workloads, representing a significant leap toward faster and more efficient hardware-based computational methods.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>ACKNOWLEDGMENT</head><p>The views, opinions, and/or findings contained in this article are those of the authors and should not be interpreted as representing the official views or policies, expressed or implied by the NSF.</p></div></body>
		</text>
</TEI>
