The performance optimization of 4-bit absolute-value detector

Absolute-Value Detection is a popular algorithm used for sorting spiking signals, and the 4-bit Absolute-Value Detector is a fundamental circuit in the spike-sorting field. This paper presents three distinct approaches for designing an absolute-value detector, along with optimization techniques used in these circuits. The paper begins by discussing the circuit’s structure, followed by using logical effort theory to calculate critical path delays. Next, the paper compares the critical path delays and transistor numbers of the three designs. Finally, the paper identifies the optimal design among the three circuits, which provides the best delay performance at a low cost. Applying optimization techniques to the optimal absolute-value detector resulted in a significant improvement in its performance. These findings have important implications for the future development of related industries, as they can help further refine and enhance the performance of spike-sorting algorithms. Overall, this paper provides valuable insights into the design and optimization of absolute-value detectors, which will be of great interest to researchers and practitioners in the field.


Introduction
As the most important and complex part of the human body, the human brain controls most of the activities of the human body.In human brain, billions of neurons can generate pulse signals to transmit information and communicate with other cells, thereby achieving corresponding functions.Different kinds of mathematical and statistical methods are applied in those algorithms to detect spikes from lots of background noises.As one of the widely used algorithms in neural signal-detecting systems, spikedetection algorithms have played an essential role in building neural signal-detecting systems [1].By applying spike-detection algorithms, researchers can detect useful signals accurately and efficiently.Absolute-value detection is one of the most extensively utilized spike-detection algorithms [2].An absolute-value detector will take a multi-bit input signal and compare it to a pre-set threshold, a digital signal with a fixed magnitude.Then the detector will generate a one-bit output that implies the relationship between the amplitude of the input signal and the given threshold voltage.This detector has good application prospects in many important fields, like brain-machine interface and analog signal processing [3,4].
The circuit of the four-bit absolute value detector is mostly realized in Complementary Metal Oxide Semiconductor (CMOS) technology.With the rapid development of Integrated Circuits, people have higher requirements for the performance of detectors.The delay of the detector is needed to be minimized to achieve higher frequency, which means the detector can process more input digital signals in a given amount of time.Also, there are some constraints on the power consumption of the absolute value detector so that it can achieve a higher level of integration.
In recent years, several circuit topologies were proposed to achieve optimal performance of the absolute-value detector.A novel design of absolute value detector was proposed by Dong using chain carry adder and taking 2's complement logic operation [5].She divided the design into two different cases by the MSB of the input signal.In the case where the sign of the input signal is positive, convert it to its 2's complement value and subtract it with the threshold signal.While the input signal has negative value, don't process the signal and only do an addition operation with the given threshold signal.To optimize the circuit's performance, she avoided using high fan-in gates in the circuit topology to minimize the parasitic delay, made a shorter critical path to minimize the critical-path delay, and replaced traditional comparators with chain adders to reduce the number of transistors used.Yao put forward a circuit structure that uses truth table to build the comparator and uses half adders and transmission gates to get the absolute value of the input signal [6].In order to optimize the performance of the circuit, he used pass transistor logic gates instead of traditional static CMOS logic gates.Lai developed a novel design of absolute value converter using a truth table [7].And he applies gate sizing and supply voltage scaling method to optimize the performance of the circuit, like power consumption, critical path delay.
Although there are a lot of studies on the design and optimization of the 4-bit absolute value detector, few papers provide a comparison on those design methods and make a summary of those design methods.This paper is organized as follows.Logical effort theory and a fundamental principle for optimizing CMOS circuits, will be introduced in Section 2 in detail.And the function of two modules of the absolute value detector will be discussed.In Section 3, three different circuit topologies will be introduced.Some optimization methods applied on those circuits will also be discussed.Then a comparison will be made regarding to the advantages and disadvantages of these three circuits.In the conclusion, this paper will summarize the optimal circuit structure and the corresponding optimization method based on previous analysis.

Principles of absolute-value detector
To realize the function of this absolute value detector, typically two modules are required: the absolute module and the comparator module.Figure 1 shows the basic diagram of the absolute-value detector.A 4-bit input digital signal is applied to the detector, and it will generate a 1-bit output indicating whether the signal's magnitude is larger than the pre-set threshold value of magnitude.The absolute module will take the input signal and output its magnitude.The comparator module will take the magnitude generated by the absolute module and output the comparison result.In this paper, A3A2A1A0 represents the 4bit input signal, the magnitude of this signal is M2M1M0, the threshold is T2T1T0, and the output is Out.

Absolute module design
As described above, the input of this module is A3A2A1A0.The signal is in 2's complement form, so the most significant bit determines whether the signal is positive or negative [7].If the MSB of the signal is 1, then the value of this signal is negative.Otherwise, the value of this signal is positive.For example, when the input is A3A2A1A0 = 0011, it represents the value 3. When the input is A3A2A1A0 = 1011, it represents the value -5.The function of this module is converting a signed input signal to a positive signal whose magnitude is its absolute value.All the inputs can be divided into two cases: when the input signal is positive, then the absolute value of the signal in binary format is: M2M1M0 = A2A1A0.When the input signal is negative, then the absolute value of the signal in binary format is: 210 = 2 ̅̅̅̅ 1 ̅̅̅̅ 0 ̅̅̅̅ + 1.
Basically, there are two possible ways to build the logic for the absolute value module.The first method is to use Carnot Diagram to get the expression for each output bit directly, then use logic gates to realize the function.The second method is to do addition depending on the input.If the input signal is negative, then use inverters to invert each bit of the input except for the MSB, which will not occur at the output.Then add 1 to the inverted input and then the correct absolute value is obtained.

Comparator module design
This comparator module needs to take the 3-bit magnitude generated by the absolute module as input and compare it to a given threshold value, T2T1T0.If M2M1M0 > T2T1T0, then the 1-bit output should be 1.Otherwise, the output should be 0. The two binary numbers both have positive value.
Generally speaking, there are two possible solutions to implement the comparator.One way to realize the function is to use a chain adder.Since subtractor and adder have the same structure, chain adder can be used to do subtraction if proper input is provided.The only result that matters is the output carry of the final adder, so the carry logic of the adder is the only part needed.Then the adders can be mirror adders, half adders, full adders, and majority gates.Another way to realize the function is to compare the two binary numbers from MSB to LSB.For example, if M2 and T2 have different values, then the result can be calculated.If M2 and T2 have the same value, then compare M1 and T1.

Logical effort theory
Logical effort theory is a design methodology that is widely used to optimize the performance of the integrated circuit.It was first raised by Ivan Sutherland in 1991, and has been applied in the industry since then.This theory defines a simpler model for complicated circuits.
To apply this theory to optimize circuits, the unit-size inverter's size needs to be clarified.Usually, for a unit-size inverter, assume the length of channels for the PMOS transistor and NMOS transistor are the same, PMOS width is approximately twice the NMOS width.That is: The unit-size inverter is expected to have an equal rising and falling delay.And the carriers that move in the channel of the NMOS transistor are electrons, while holes are the main carriers transported in the channel of PMOS.These two carriers have different mobilities.Therefore, the size of the PMOS transistor is larger than the NMOS transistor to make the pull-up ability of PMOS transistor as strong as the pull-down ability of NMOS transistor, so that the inverter can achieve equal rising and falling delay.Also, the delay of this unit-size inverter is defined as τ.
After determining the parameters of the unit-size inverter, 6 basic parameters are introduced in this theory.All the basic CMOS-based logic gates will have these parameters.The logic effort g refers to the ratio of the input capacitance of logic gate to the input capacitance of unit-sized inverter.This is shown below: The electrical effort h is the ratio of total capacitance that exists at the output of the logic gate to the input capacitance of the logic gate.This is shown below: The effort delay f is the multiplication of g and h. it represents one part of the delay for a logic gate.This is shown below: The other part of the delay for the logic gate is the parasitic delay, notated by p.It represents the delay of a logic gate that doesn't drive any load.The parasitic capacitance of the inverter causes this part: the diffusion capacitance of PMOS transistor and NMOS transistor.
When the circuit has different branches, the branching effort b is used to quantize the effect of circuit branching to the total delay.The formula of b is shown as follows: In this formula, Conpath represents the load capacitance of the next logic gate along the path and Coffpath represents the load capacitance of other connected logic gates that are not on the path.With branching effort b, the formula of f can be modified as below: The normalized delay d of one logic gate can be defined using the previous 5 parameters: The unit of d is τ, which is the delay of unit-sized inverter.For a circuit with a chain of logic gates on the path, the total delay is: For simplicity, use G, H, P and F to represent the circuit's logical effort, electrical effort and parasitic delay.This is shown below: The purpose of this method is to optimize the total delay D when F is fixed.Then all the effort delays in each stage are supposed to be uniformly distributed.Given the number of stages N, the optimal value of f is: Then, the size of each gate can be obtained using the formula shown below:

Circuits and optimization methods
In this section, three different designs of absolute-value detector circuits are introduced.These three designs apply different circuit topologies and optimization methods to minimize the critical path delay.Then a comparison is made among these three designs regarding their critical path delays and the number of transistors used.After comparing these three designs, the optimal design is finally selected with fast speed, low area, and energy cost.

Design method 1: use truth table and karnaugh map
The first design to be introduced uses truth table and Karnaugh map to get the logical expression of outputs of the absolute module and comparator module.The circuit is built based on the logical expression of these outputs.In the absolute module, when the input of A3 is 0, the absolute M2M1M0 is equal to A2A1A0; when the input of A3 is 1, the absolute value of M2M1M0 is equal to 2 ̅̅̅̅ 1 ̅̅̅̅ 0 ̅̅̅̅ + 1.By using the Karnaugh Map, the formulas of three outputs are derived as follows: For the comparator module, the logical expression of the 1-bit output can also be obtained based on the truth table.The formula is as follows: After getting these expressions, the circuit structure can be built by combining the two modules of the circuit, as shown in the Figure 2: After getting the structure of the circuit, the next step is to analyze the critical path delay and optimize the delay.Two optimization methods are applied to this circuit.The first optimization method is minimizing the parasitic effort on the critical path.In this circuit, the critical path contains a 4-input NOR gate and XOR gate, and the parasitic effort of these two gates are relatively large.Therefore, they are expected to be replaced by gates with smaller parasitic efforts, such as 2-input logic gates.Then the logical expression for M1 and M2 can be optimized as follows: The second optimization method is to reduce the branch of the circuit.Based on the equation of delay introduced in Section 2, the branching effort b is an essential part of the total delay.If the value of b is smaller, then the delay can be reduced.In this circuit, branches appear at the interface of the two modules, which will increase the total delay.By combining two modules into one stage, the branches can be reduced.
After applying these two methods, the final circuit is shown as figure 3.
Figure 3.The final version of the absolute value detector circuit [8].After determining the circuit schematic, logical effort theory is used in estimating the worst-case delay of this circuit.The critical path is extracted from the circuits as shown in the figure 4:

Design method 2: use adders and cascaded comparators
In this design, mainly three components are included: adder, multiplexer and comparator.The diagram of the circuit is shown in the figure 5: Figure 5.The circuit diagram of the absolute value detector [9].For the absolute module, the circuit uses an adder to get the absolute value of the input.If the most significant bit of the signal is 1, that means the signal is negative, then A2A1A0 should be inverted and be added with 1 to get the magnitude of this signal.Therefore, an adder is used in this part.The multiplexer selects either the signal from the input or the output result from the adder based on the value of A3.For the comparator, in order to get the comparison result as soon as possible, the comparison are supposed to start between two MSBs of the two signals.If M2 and T2 have different value, then the output can be determined and no comparison between lower bits is needed.Therefore, a cascaded structure is applied to the comparator so that some results can be generated and propagated to the output quickly by merely comparing the highest bit.
To optimize the critical path delay of the circuit, the XOR gate on the critical path is redesigned using transmission gate logic.The improved design of 2-input XOR gate is shown in Figure 6: In traditional XOR gate, the logical effort and parasitic delay is: Therefore, this new XOR gate has smaller logical effort and parasitic delay, which will make the total delay of the circuit smaller.
With this improvement being made, the new critical path of this circuit is shown as Figure 7: (24)

Design method 3: use mirror adders plus multiplexers
The absolute-value detector is built using mirror adders and multiplexers in this design.In some designs, full adders and half adders are used to get the absolute value of the signal, but mirror adders are an improvement to these traditional designs.Compared to full adders, mirror adders have fewer transistors because it only need to get the output to carry bit and doesn't need to get the sum bit.Also, only the carry signals are needed to determine whether the magnitude of input signal is larger than the pre-set threshold signal.For Multiplexers, transmission gates are used because there are only two transistors in one gate, which costs less area and energy consumption.The diagram of this circuit is shown in Figure 8.

Figure 8.
The diagram of the 4-bit absolute value detector [10].In this Figure 9, the input signal is X3X2X1X0, the threshold signal is T2T1T0, and the output bit is OUT.To get the comparison result, the adder should subtract T2T1T0 and M2M1M0.If the output carry signal of the last mirror adder is 1, then the threshold voltage has a larger magnitude than the input signal.If the output carry signal is 0, the input signal has a larger magnitude.
If the input signal is positive, it subtracts T -X, which is T plus the inverse of X plus 1. So, the multiplexer will select the inverse of X0, X1 and X2, and the inverse of X3 will be loaded to the first mirror adder.If the input signal is negative, then we do the subtraction T -(-X), which is equivalent to T + X, so the multiplexer will select X0, X1 and X2, and the inverse of X3 will also be loaded to the first mirror adder.In both cases, the output carry bit of the last mirror adder is the comparison result.To optimize the delay of this circuit, the critical path should be identified at first, which is the longest path for the signal from the input to propagate to the output.It is shown in Figure 8:

Comparison and selection of designs
After introducing three typical ways to realize the absolute-value detector circuits, comparisons are needed among these three designs to figure out this circuit's optimal design.Critical path delay is one of the most important metrics for a digital integrated circuit.It is usually referred to as the largest combinational logic delay from the input to output, and it will determine the speed performance of the digital circuit.
Also, the number of transistors used in a circuit is also an important factor in determining the performance of a circuit.Suppose a circuit used a large number of transistors.In that case, it will cost more area, and the total capacitance of the circuit will be very large, which leads to a large amount of leakage power consumption and dynamic power consumption.
For the three circuits discussed above, a comparison regarding to critical path delay and the number of transistors used is shown in Table 1: Table 1.A Comparison of three circuits.

Combinationallogic (Truth Table) Adder plus comparator Mirror adder and multiplexers
The number of stages on the critical path 9 10 11 The number of transistors used in total 184 76 50 The critical path delay 34.2 39.84 38.7 From the comparison result, the first design has a minimum number of stages on the critical path and achieves the minimum delay, but it consumes a much greater number of transistors than the other two designs.The second design has a relatively small number of transistors, but its critical path delay is the largest among three designs.Therefore, it is obvious that the third design that applies mirror adders and multiplexers is the optimal design among these three designs in terms of worst-case delay and the number of transistors used in total.

Conclusions
With the rapid development of research fields like Brain-machine interfaces, medicine development, and artificial intelligence, spike-sorting algorithms now have more broad application prospects.The 4bit absolute-value detector is widely used in fields such as neural signal acquisition systems, brainmachine interface and analog signal processing.So, the performance optimization of the absolute-value detector is of great importance in improving the performance of signal processing systems.
This paper proposes three different designs of absolute-value detector circuits at first.The first design uses truth table and Karnaugh Map to directly get the logical expression of the output and build the circuit based on the formulas.The second design applies adders in absolute value module to get the magnitude of the input signal, and cascaded comparator in comparator module to get the result as quickly as possible.The third design uses mirror adders and transmission-gate based multiplexers to subtract between the threshold voltage and the absolute value of the signal.Then this paper compares the performance of these three designs and finally figures out that the third design is optimal among these three circuits in terms of critical path delay and transistor consumption.It only uses 50 transistors in the whole design, and have a critical path delay of 38.7.This circuit has achieved optimal performance for mainly two reasons.Firstly, applying a mirror adder in the circuit is a big improvement compared to traditional designs that use full adders.By using mirror adders, the number of transistors is reduced.Secondly, multiplexer transmission gates also help save the number of transistors.This paper summarizes several implementation and optimization methods of absolute-value detector, which can serve as a good reference for further application and improvement of this circuit.

Figure 1 .
Figure 1.Basic diagram of absolute value circuit.

Figure 2 .
Figure 2.The structure of the 4-bit absolute value detector[8].After getting the structure of the circuit, the next step is to analyze the critical path delay and optimize the delay.Two optimization methods are applied to this circuit.The first optimization method is minimizing the parasitic effort on the critical path.In this circuit, the critical path contains a 4-input NOR gate and XOR gate, and the parasitic effort of these two gates are relatively large.Therefore, they are expected to be replaced by gates with smaller parasitic efforts, such as 2-input logic gates.Then the logical expression for M1 and M2 can be optimized as follows:

Figure 4 .
Figure 4.The critical path of the absolute value detector [8].Therefore, the delay is calculated as follows:  =

Figure 7 .
Figure 7. Critical path of the whole circuit[9].By using logical effort theory, the total delay of this critical path is: =

Figure 9 .
Figure 9.The critical path [10].After figuring out the critical path, the minimum delay of this circuit is derived using the formulas shown below: =