# EMC: Efficient Muller C-Element Implementation for High Bit-width Asynchronous Applications

John M. Emmert Dept. of EECS University of Cincinnati Cincinnati, OH, USA john.emmert@uc.edu

Abstract—A Muller C-Element is a digital circuit component used in most asynchronous circuits and systems. In Null Convention Logic, the Muller C-Elements make up the subset of THmn threshold gates where the threshold, m, and the input bitwidth, n, are equal. This paper presents a new Efficient Muller C-Element implementation, EMC, that is especially suitable for Null Convention Logic applications with high input bit-widths, and it is much faster and smaller than standard implementations. It has a two-transistor switching delay that is independent of the input bitwidth, n, and exhibits low noise and static power consumption. It is suitable for all Muller C-Element applications, especially those like Null Convention Logic register feedback circuits that can have large input bit-widths. To reduce static power consumption, it uses active resistors that are only turned "ON" when necessary. Two output stages are presented to implement the required Muller C-Element digital hysteresis: standard, semi-static cross-coupled inverter version, and differential sense-amplifier option. For large values of *n*, our circuit requires approximately one-half fewer transistors than combining smaller Null Convention Logic THmn semi-static threshold gates. We have successfully simulated up to *n* = 1024 at a 65 nm node.

*Keywords*—Assurance, asynchronous, feedback register, logic, Muller C-element, null convention logic, security, side-channel attacks, threshold gates, trust

## I. INTRODUCTION

An exploitable weak point for synchronous or clocked integrated circuit (IC) based systems is the side channel attack (SCA). By leveraging Trojan circuits added at an untrusted foundry or monitoring power consumption, electromagnetic emanation, temperature variation, or other indirect operational characteristics, a malicious, untrusted agent or entity can compromise security and steal sensitive information like credit card pin numbers and secret encryption keys [1-3].

One approach that has successfully mitigated SCAs is clockless asynchronous digital design [4]. There are various asynchronous methods that achieve clock mitigation, but not all of them completely remove the clock. Two major categories of circuits are *speed independent* and *delay independent* [5]. *Delay independent* systems entirely eliminate the need for clock signals and are useful as an obfuscation approach to mitigate SCAs. A popular implementation logic for *delay independent* asynchronous circuits is Null Convention Logic (NCL) [6]. The key component of NCL circuits is the THmn threshold gate,

Sara A. VanDewerker Dept. of EECS University of Cincinnati Cincinnati, OH, USA vandewsa@mail.uc.edu

where *m* is the threshold and *n* is the number of input bits or bitwidth. This effort addresses the hardware implementation of a subset of the TH*mn* threshold gates, the TH*mm* or Muller C-Elements, where the threshold, *m*, is equal to the number of inputs, *n* [7-8]. Here, the TH*mm* notation will be used interchangeably to describe Muller C-Elements.

Fig. 1 shows TH*mm*, the NCL symbol for the Muller C-Element, which forms a subset of the more general set of TH*mn* threshold gates.



Fig. 1. Muller C-Element with threshold = bit-width = m and output Z.

The following equation defines the function of the binary Muller C-Element or TH*mm* gate shown in Fig. 1

$$Z[k] = (I_1 \cdot I_2 \cdot \dots \cdot I_m) + (I_1 + I_2 + \dots + I_m) \cdot Z[k-1] \quad (1)$$

where Z[k] is the current output as a function of the previous output Z[k-1],  $I_j$  is digital input *j*, and *m* is the number of inputs.

In words, THmm is initialized so all m inputs are reset to '0.' This resets the output Z to '0.' The output Z will remain '0' until the threshold, m, is reached (all m inputs are set to '1'). Once the threshold is reached, the output Z is set to '1.' The Muller C-Element or THmm gate output Z demonstrates a digital hysteresis or latch effect and remains '1' until all m inputs are reset to '0.' At that point, the output Z is again reset to '0.'

Null Convention Logic circuit implementations (including Muller C-Elements) would be more useful and acceptable to the general digital design community if component delays and sizes could be further reduced to more closely match conventional Boolean logic elements. For TH*mm* gates, delay and area increase with m, and due to limitations on the number of transistors that can be combined in series, for even small values of m, they can quickly become quite large. To improve acceptability of asynchronous circuit designs, a new efficient Muller C-Element (EMC) gate implementation is presented in Fig. 2. Its delay is two transistors levels regardless of the value of the input bit-width, m, and since the new implementation is not limited by the number of transistors in series, it is significantly

This work was supported in part by the NSF Center for Hardware and Embedded Systems Security and Trust (CHEST) under NSF Grant 1916722.

smaller than those formed by staging up or combining smaller TH*mm* gates to create larger ones. Two output stages are presented to implement the TH*mm* gate digital hysteresis: a) semi-static, cross-coupled inverter version, and b) sense-amplifier option.





Fig. 2. EMC: Proposed two-transistor delay, TH*mm* Muller C-Element with *m* inputs, *I*, and output, *Z*. a) 2m + 8 transistor standard semi-static, cross-coupled inverter version, and b) 2m + 9 transistor sense-amplifier option.

### II. BACKGROUND

This section provides background for NCL Asynchronous Logic, the NCL Traditional Feedback Circuit (TFBC) Muller C-Element, and some other Muller C-Element implementations.

## A. Asynchronous NCL Logic

There are different types of asynchronous design techniques ranging from locally clocked to clockless, and each type has its own advantages and disadvantages [5]. One type of clockless logic circuit is based on NCL [6]. Null Convention Logic circuits work well for data flow designs because data flows through NCL networks in waves. A data wave is only processed when all incoming data is available, making it self-timed. Since data is only processed when available, no timing assumptions are required, and this attribute guarantees data sequencing and correct data arrival at the receiver under varying gate, process, and wire delays [6].

# B. Asynchronous NCL Registers

A subset of basic unweighted THmn threshold gates is the Muller C-Elements or THmm threshold gates. In this subset, the threshold m is equal to the bit-width of the threshold gate. This is an important subset, and it has many applications in both NCL and more general asynchronous logic. One example is in the asynchronous NCL equivalent of an N-bit register. Null Convention Logic registers work on the same principle as synchronous registers. Both separate blocks of combinational logic and pass waves of data from one block to the next. A key difference is NCL registers pass data only when it's available and do not wait for clock edges. Null Convention Logic registers rely on handshakes or feedback (FB) signals. When ready to receive data, an NCL register communicates with the preceding register through a FB signal indicating it is okay to receive data, and each NCL register also receives a FB signal from the next register in the chain indicating it is okay to pass data. An example synchronous sequential dataflow circuit that uses N =4 signal registers (N = register bit-width) is shown in Fig. 3., and similarly, Fig. 4 shows a functionally equivalent four signal NCL asynchronous dataflow circuit with asynchronous registers. A key component of NCL asynchronous register handshaking is the TFBC. In Fig. 4 the TFBC examples (circled in red) are composed of TH44 gates with an inverter on their outputs. For  $m \ge 5$  input bits, the TFBC THmm gate is usually implemented by staging up smaller TH44 gates. It should be noted that size and delay of the TFBCs are a function of the number of TFBC inputs, N, and ideally, a single THmm gate would be used where N = m.



Fig. 3. Example N = 4-signal synchronous data-path.



Fig. 4. Example N = 4-signal NCL asynchronous data-path.

To demonstrate the delay, and area requirements of a typical THmm Muller C-Element gate used in a TFBC, Fig. 5 (a) shows a CMOS implementation of a traditional semi-static TH44 gate. As described earlier, limitations on the number of series connected transistors for most technology nodes limits the input bit-width to four. Fig. 5 (b) shows an example m = 16-input staged-up THmm gate implementation for an N = 16-signal NCL TFBC register. The m = 16 bit-width staged-up THmm gate uses five TH44 cells for a total of four transistor delays and  $5 \cdot 12 = 60$  transistors. Thus, Fig. 5 (b) demonstrates how smaller THmm gates are staged to form larger THmm gates. These smaller THmm gates can be either static or semi-static. Static versions

are larger, so we use the smaller, semi-static versions later for comparison and analysis.



Fig. 5. (a) Typical CMOS implementation of a 12 transistor, two level delay semi-static TH44 threshold gate, and (b) Example m = 16 bit-width THmm gate using five staged TH44 gates for a 16-signal TFBC NCL register.

## C. Muller C-Element

The main component of the TFBC is the Muller C-Element, or in NCL notation, the THmm gate. Fig. 1 and (1) describe the function of the THmm gate. The THmm gate is an asynchronous state-based component, and some previous implementations include [9-13]. To initialize it to a starting state, all m inputs should be reset to '0,' which resets or initializes the output Z to a '0' start state. The output Z should maintain a '0' value until all m inputs are set to a value of '1,' which then sets the state value of the output Z to a '1' value. The output Z will maintain a '1' state value until all m inputs are reset to '0,' then the output Z goes back to the reset or start state, '0.'

Since the majority of CMOS processes limit the number of series transistor connections to four and given two transistor levels of delay for each TH44 gate, in general the number of transistor levels in a staged-up THmm gate is  $2 \cdot [log_4m]$ , where m is the number of inputs [14]. For an example application, in a 64-point complex FFT circuit, the data path would have 64 complex inputs. If each complex input is 2x8-bits, then the datapath has total of m = 1024 bits and the number of levels of TH44 gates in the staged-up THmm gate is five, and with two transistor delays in each level of TH44 gates, the total is 10 transistor levels of delay for a 1024 bit-width register. This is a modest example, and larger TFBC circuits are realistic. In [15] a comparison between a pipelined and non-pipelined NCL ALU resulted in an area increase of 100% and a throughput increase of only 1.32. Other improvement methods are found in [16, 17]. However, the THmm gate in the TFBC is typically a source of significant area and delay, and to make NCL practical and generally accepted, it is important to minimize FB circuit area and delay. Section III describes EMC, a two-transistor level Muller C-Element to reduce TFBC area and delay.

## III. NEW IMPROVED MULLER C-ELEMENT

The proposed Muller C-Element, EMC, shown in Fig. 2 is a form of transistor-resistor logic. It is designed to reduce delay and area for NCL FB circuits. The *pull-down* and *pull-up* networks in traditional CMOS logic gates are usually complementary networks of series-*AND* and parallel-*OR* connected nMOS and pMOS transistors. Single stage gates are typically limited to no more than four inputs due to restrictions

on the number of series-AND connections. The traditional NCL THmm threshold gates, like the one in Fig. 5 (a), are similar but worse. The *pull-down* and *pull-up* networks in traditional *set to* '1' and *reset to* '0' NCL CMOS THmm threshold gates are both composed of m series-AND connected nMOS and pMOS transistors (it should be noted that the *pull-down* and *pull-up* values are inverted to provide the *set to* '1' and *reset to* '0' output Z values). Again, the maximum m is limited to four (maybe five) in most technology nodes. However, there is no such limit on the parallel-*OR* connected transistors.

A big advantage of the EMC circuit in Fig. 2 is the parallel-OR connection leveraged to eliminate the limit on the number of m inputs. To i) make sure neither the set nor the reset subcircuits draw power when all m inputs are either '0' or '1' (circuit at rest) and ii) to make sure that while inputs transition from '0' to '1' (or vice versa) only one resistive transistor is active at any given time, EMC uses the set and reset subcircuit shown in Fig. 6.



Fig. 6. EMC reverse logic set to '1' and reset to '0' subcircuit.

In Fig. 6, the inverted output, Zb, of the TH*mm* gate is fed back to control the gates of the nMOS and pMOS active resistors. Depending on the state of the output Z (and likewise Zb), only one of the two active resistors will ever be ON at any given time. In Fig. 6, the source terminal of the pMOS active resistor is located at the drain terminals of the parallel pMOS input transistors. Similarly, the source terminal of the nMOS active resistor is at the drain terminals of the parallel nMOS input transistors. This guarantee there is no path from VDD to VSS when all *m* inputs are either at an all '0' state or an all '1' state. This topology saves power when the circuit is at rest (all '0' or all '1' input state) or when the inputs are transitioning between states (some inputs '0' and some '1').

The second main component of the EMC THmm gate is the write subcircuit shown in in Fig. 2 and Fig. 7. It is controlled by the reverse logic signals X and Y, and it is based on a modified t-gate. Since node Y will only be asserted ('1') when all m inputs are reset to '0,' this is the only case when transistor Y will be ON, pulling Z down to a reset value of '0.' The rest of the time, transistor Y will be OFF with no effect on Z. Similarly, node X will only be asserted ('0') when all m inputs are set to '1,' this is the only case when transistor X will be OFF with no effect on Z. Similarly node X will only be asserted ('0') when all m inputs are set to '1,' this is the only case when transistor X will be ON, pulling Z up to a set value of '1.' The rest of the time, transistor X will be OFF with no effect on Z. The widths of the nMOS and pMOS transistors in the write circuit are based on the type of output stage chosen. If a traditional semi-static, cross-coupled inverter is chosen, the widths of the transistors are sized so that when they are ON, they have a low enough resistance to override the cross coupled FB

inverter that drives Z and is driven by Zb. If the sense-amplifier output stage is chosen, widths of nMOS and pMOS transistors in the write subcircuit can be minimized to save area.



Fig. 7. Write subcircuit for EMC THmm gate.



Fig. 8. EMC ouput module options: (a) semi-static, cross-coupled inverter version, and (b) sense-amplifier output latch option.

The last piece of the EMC Muller C-Element gate shown in Fig. 2 and Fig 8 is the output stage. It serves two primary functions: to implement the required digital hysteresis when Z is set and reset, and minimize load on the write circuit. Fig. 8 (a) shows the more traditional semi-static cross-coupled inverter output stage, and Fig. 8 (b) shows the optional sense-amplifier output stage version. The semi-static version shown in Fig. 8 (a) uses one less transistor. However, as discussed above it does require wider transistors in the write circuit. The sense-amplifier version in Fig. 8 (b) performs the THmm hysteresis function and holds the value of Z until conditions are met to flip its value. The sense amplifier version requires one more nMOS transistor to implement its current source, however, it allows minimum width transistors in the write circuit. Both options are provided here for consideration. Fig. 2 shows the two implementations of the efficient Muller C-Element, EMC, presented above.

## IV. SUMMARY AND CONCLUSIONS

There are several advantages to clockless asynchronous digital design [7]. Examples include: 1) the asynchronous nature of logic switching minimizes opportunities for power, electromagnetic radiation, temperature and other SCAs; 2) digital noise reduction for sensitive, mixed-signal ICs; 3) data is processed at average speed versus worst case for synchronous sequential circuits; and 4) the difficult clock-routing step is eliminated from the design flow. Some drawbacks include logic area increase, dual rail wires for all signal nets, and lack of dedicated computer aided design tools for asynchronous circuit design. To make asynchronous design more acceptable, the drawbacks need to be improved, and it needs to be easier to implement asynchronous technologies like NCL.

To reduce area and decrease propagation delay, a new EMC THmm Muller C-Element implementation was presented that has only two transistor levels of delay regardless of the input bitwidth, m, and it was 46% smaller than traditional, staged-up versions relative to number of transistors. It is especially useful for applications like the TFBC that can have high input bitwidths (it should also be noted that since the new cell contains the Zb node, it can be used directly to implement TFBC for NCL asynchronous register applications). By controlling the gates of the active resistors using the Zb output as a feedback value, power is reduced by only turning ON the active resistors during input transition. Similarly, by connecting the source terminals of the active resistors directly to the NAND and NOR network nodes (instead of directly to the supply nodes), the new implementation minimizes static power consumption when the circuit is at rest (when all inputs are either at reset '0' or set '1').

### REFERENCES

- P. Kocher, J. Jaffe, and B. Jun, "Differential power analysis," in Annu. Int. Cryptology Conf., Springer, 1999, pp. 388–397.
- [2] M. Tehranipoor and F. Koushanfar, "A survey of HW Trojan taxonomy and detection." *IEEE Des. Test Comput.* Vol. 27, 2010, pp. 10–25.
- [3] L. Lin, W. Burleson, and C. Parr, "MOLES: malicious off-chip leakage enabled by side-channels," in *IEEE/ACM Int. Conf. on CAD (ICCAD)*, Nov., 2009, pp. 117–122.
- [4] S. Moore, R. Anderson, P. Cunningham, R. Mullins, and G. Taylor, "Improving smart card security using self-time circuits," in *Proc. of the Eighth Int. Symp. on Asynchronous Circuits and Syst., IEEE Computer Soc.*, Silver Spring, MD, 2002, pp 211–218.
- [5] R. Sridhar, "Asynchronous Design Techniques," Proc. of the Fifth Annu. IEEE Int. ASIC Conf., Sep. 1992, pp. 296-300.
- [6] K. Fant and S. Brandt, "NULL convention logic: a complete and consistent logic for asynchronous digital circuit sythesis," in *Proc. of the Int. Conf. on Appl. Specific Syst., Architectures and Processors*, Aug. 1996, pp. 261-273.
- [7] D. E. Muller, "Theory of Asynchronous Circuits," Rep. no. 66, Digital Comput. Lab., Univ. of Illinois at Urbana-Champaign, 1955.
- [8] D. E. Muller and W. S. Bartky, "A Theory of Asynchronous Circuits," in Proc. of the Int. Symp. on Theory of Switching, Part 1, Harvard Univ. Press, 1959, pp. 204-243.
- [9] Y.A. Stepchenkov et al., "H flip-flop," RU Patent 2371842, Oct. 2009.
- [10] C. P. Taylor, "Analysis of a Two-Stage Voltage Divider for Logic Reduction in an Asynchronous Register," M.S. thesis, Dept. Elect. Eng., Wright State Univ., 2014.
- [11] S. Faribanks and C. E. Molnar, "One-Hot Muller C-Elements and Circuits Using One-Hot Muller C-Elements," U.S. Patent 6 486 700 B1, Nov. 2002.
- [12] S. Faribanks, "Two-Stage Muller C-Element," Patent WO 01/22591 A1, Mar. 2001.
- [13] S. L. Lu, "Improved Design of CMOS Multiple-Input Muller C-Elements," in *Electron*). *Lett.*, Sep. 1993, pp. 1680-1682.
- [14] S. C. Smith, "Gate and Throughput Optimizations for NULL Convention Self-Timed Digital Circuits," Ph.D. dissertation, Univ. of Central Florida, 2001.
- [15] K. Bandapati and S. C. Smith, "Design and Characterization of NULL Convention Arithmetic Logic Units," in *The 2003 Int. Conf. on VLSI*, Jun. 2003, pp. 178-184.
- [16] Smith, S. C., "Speedup of Self-Timed Digital Systems Using Early Completion," in *The IEEE Comput. Soc. Annu. Symp. on VLSI*, Apr. 2002, pp. 107-113.
- [17] Smith, S. C., "Completion-Completeness for NULL Convention Digital Circuits Utilizing the Bit-wise Completion Strategy," in *The 2003 Int. Conf. on VLSI*, Jun. 2003, pp. 143-149.