Design of a 5.94 FO4 absolute-value detector based on hybrid logic technique

With the development of neural signal acquisition systems, spike sorting algorithms have become a hot topic in related fields. The delay of the spike detection system largely determines the speed of the overall signal acquisition system. This paper designs a low-delay Absolute-value Detector based on the GDI technique to improve the timeliness of the neural signal acquisition system. The final circuit has only 20 logic gates-56 transistors. According to the logical effort methodology, the final path delay is 5.94 FO4. This research can provide a reference for the design of future portable ultra-low delay neural signal acquisition systems.


Introduction
The absolute value detector, an essential component of a neural signal acquisition system, fundamentally determines the upper limit of detection speed.According to the conventional method, Tetrode [1] has four different channels and each channel is detected independently.When the detected value exceeds a threshold, it is recognized as a spike.
The purpose of this paper is to design a low-delay absolute value detector.The delay of the target circuit is reduced in a way that minimises the number of MOSFETs [2].In the design of this IC, the number of logic gates was first reduced by simplifying Boolean expressions, then hybrid logic structures such as GDI [3] and CMOS were used to reduce the number of MOSFETs inside the logic gates, and finally the size of the transistors was adjusted using the logic effort principle to obtain the final design circuit.This research was able to complete real-time spike detection with very low delay, which is a great enhancement to the real-time monitoring of spike sorting algorithms.At the same time, the smaller chip size enables a reduction in the size of the detection device.For specific experimental scenarios, such as elderly people with limited mobility, or small experimental environments, a small portable detector is preferred.
The whole design of the absolute value detector will be divided into two parts: the 2's complement absolute value conversion circuit and the comparator circuit.The design would be detailed as follows: Chapter 3 introduces the GDI technique.

Literature Review
With the development of neural signal acquisition systems in recent years, more research has been put into spike sorting algorithms [4,5].The aim of spike detection, the second stage of such algorithms, is to detect how active a neuron is by comparing the magnitude of the target potential with the set potential.Along with the development of spike detection algorithms, more attention has been paid to the timeliness of such detection-faster detection systems that provide timely feedback on the status of the detected target.While the software algorithms are rapidly being updated [4,6], the hardware needs to be superior in design.Reviewing the existing designs of absolute value detectors [2,[7][8][9], most of them are integrated circuits built according to the CMOS technique, and the FO4 delay is basically above 14.CMOS technology is almost 60 years old and has been dominating chip manufacturing for a long time.
To further reduce the delay of a logic circuit, it is necessary to go beyond the existing design samples and use a new logic technology.

GDI Technique
Before introducing circuit design, it is necessary to first introduce the concept of GDI, known as the Gate Diffusion Input design technique, which is a technique for low delays and small areas in VLSI digital design.Unlike CMOS and PTL technology designs, GDI requires only two transistors to implement most logic gate functions [3].The GDI technology used in this design is depicted in Figure 1, and the difference in transistor count, when compared to CMOS technology, is shown in table 1 below.According to the GDI technology, circuits such as adders can be built with up to 60 per cent lower delay than those built with CMOS technology, and a large amount of simulation data proves that GDI technology is indeed better than CMOS technology in terms of delay.

2's Complement Circuit
In neuronal spike detection, the electrophysiological activity of a neuron can be considered as a continuously varying set of voltage signals.Considering that the amplitude of neuronal spikes has a positive or negative sign, we consider signals with absolute values above the detection threshold as having a signal input, i.e., a binary digital input "1", and vice versa, a digital "0", and we can then convert the electrical signal into a corresponding We can then convert the electrical signal into a digital signal accordingly.In addition to this, the first thing that needs to be made clear is that the delay of the detector is determined by the delay on the critical path, which is the path with the highest delay in the circuit.This means that if we want to get the lowest possible delay, all else being equal, we need to keep the number of logic gates in the detector as low as possible.
As mentioned above, the whole circuit design will be split into two parts, the first being the 2's complement absolute value converter, and the second being the comparator.It is envisaged that the fluctuations in the electrophysiological signal are positive and negative so the input to the detector needs to differentiate between positive and negative signals to compare them with a preset threshold.This requires both positive and negative signal detection circuit designs to achieve this function, and the number of logic gates will increase significantly.The problem of how to detect both positive and negative values with just one circuit becomes the first problem that the design needs to solve.Therefore, we introduce a method commonly used in computing to represent positive and negative values: the 2's complement code, using a four-bit 2's complement code-starting with MSB (most significant bit) A3, A2, A1, A0, where A3 is the sign bit-the input voltage signal can be represented.The input voltage signal can be expressed as a magnitude from -7 to 7, in binary terms from 1000 to 0111.Table 2 is the truth table of the input value.
As the MSB (most significant bit) bit is used to represent the sign bit, the number of bits for a positive value is actually only 3, the maximum being 7, i.e. 0111, so 1000 has no corresponding positive value in this design and is treated as 0000.Since there is now a way to enter both positive and negative values, the next step to be achieved is to convert the input to take the absolute value, the main idea being to convert the negative input to a positive value, with the positive value remaining unchanged.Based on the 2's complement code, the approach we take to taking the absolute value of a negative number is to convert it into its complement code.
The traditional 2's complement conversion circuit is based on a full adder, where the principle is to first flip all the bits, then add 1 to the LSB (least significant bit), then add to the next bit if it produces an overflow, and so on until there is no overflow or until the MSB is added.For example, if the value of A3A2A1A0 is 1101, then the flipped value is 0010, at which point 1 needs to be added to obtain 0011, which corresponds to the absolute value of the 2's complement code 1101.
For our study, A3 is the sign bit, and in fact, the number of bits to be subsequently compared is only 3.This means that only 3 full adders are needed to construct the 2's complement circuit.However, a conventional CMOS full adder requires 28 MOSFETs, and even the better-constructed Mirror Adder [7] requires 24 MOSFETs, so the first part requires at least 72 MOSFETs.In contrast, a 2's complement circuit based on a GDI and multiplexer structure requires only 10 MOSFETs.
Dattatraya proposes a 4bit 2's complement circuit based on a multiplexer [10].The principle of the circuit is obtained by looking at the truth table and can be simply understood as follows: 1. Copy the 0's from right to left starting from LSB until 1 and copy the first 1 as well.2. Invert all subsequent digitsi.e., replace the 0's and 1's with each other.-i.e., 0 and 1 are replaced by each other.Based on this inversion and copying principle, the 2's complement circuit can be implemented with a multiplexer to preserve the original number or to invert it.The detection of the flag "1" can be achieved by using an OR gate, since the result of any operation with 1 is 1. Figure 2 shows a modified 3bit 2's complement circuit based on a multiplexer.Compared to a conventional adder 2's complement circuit, this circuit still performs well against multiple inputs, as demonstrated by the simulation data in studies of Dattatraya et al. [10].Thanks to the GDI technique, two transistors are required for one OR gate, four transistors for two multiplexers and four transistors for two inverters, adding up to a total of only 10 transistors to achieve complex logic operations.

Absolute Value Circuit
While existing 2's complement circuits can convert positive and negative values to each other, they cannot intelligently extract absolute values.Based on the existing 3bit 2's complement circuit, we can simply add two multiplexers to take the absolute value of the input -i.e., keep the positive input and convert the negative input. Figure 3 shows the absolute value circuit.
Figure 3. Absolute value circuit.By looking at the truth table, we can see that the number of bits of LSB after the operation is always equal to the number of bits of the input value LSB, so there is no need for any change.A3, as the sign bit, is always 0 in a positive value, and there is no need to join the subsequent operation.The point lies in how the two digits A1 and A2 change.We use A3 as the select digit of the multiplexer.If the A3 input is 1, which means the input is negative, the number after the 2's complement operation is selected as the output.Conversely, if the A3 input is 0, then the input is positive and the original value of the input is selected as the output.Thus, the final absolute value conversion circuit that has been designed requires only 7 logic gates and 14 transistors, a far lower number than existing absolute value conversion circuits [2,7,8,11].

Comparator Circuit
Firstly, the Boolean expression for a 1-target comparator has: The 3-bit comparator used in this study uses the classical comparator structure, which, according to theory, should first compare the MSB and then sequentially compare to the LSB, but if the MSB bits are already compared in size, there is no need for a later comparison process, because the higher bits hold absolute weight.The logic structure of the detector used in this test is shown in Figure 4. Comparator circuit [2].A is the output of the absolute value conversion circuit, and B is the target threshold that can be set manually by the user. 1 (high) is output when the input value is higher than the target threshold, and 0 is output when the input value is less than or equal to the target threshold.A new structure is used for the XOR gate in this design in order to be able to reduce the delay as much as possible [12], which is shown in Figure 5.The advantage of this XOR gate is that only four transistors are required, giving the mentioned structure a lower delay and smaller area than CMOS structures requiring 12 transistors and PTL structures with 8 transistors [12].

Path Delay
Typically, the way to calculate the delay of a circuit is to find the delay of the critical path, as the critical path always has the highest delay.In this design, since the delays of the individual logic gate components are unknown and most logic gates are similar in structure, the path with the highest number of logic gates is assumed to be the critical path, and Figure 6 shows the critical path for this design.The output load is designed to drive a 32-unit-sized inverter.The trench aspect ratios for PMOS and NMOS are shown in the following equation.
To calculate the critical path delay, the gi (logical effort), hi (electrical effort), pi (parasitic delay) and bi (branching effort) of each logic gate needs to be calculated first.Table 3 is the logical effort and parasitic effort for each gate.
When calculating the path electrical effort, the electrical effort intermediate terms all cancel each other out, thus giving the following equation: Path Effort: Because F is constant regardless of sizing, thus the path delay is minimized by making all gates have the same fi.
Delay in a technique is always normalized to Fo4 style [13].

Results
The final circuit design, shown in Figure 7, consists of 20 logic gates with 56 transistors and a delay of 5.94 FO4.If only CMOS technology had been used, a minimum of 96 transistors would have been required.In comparison, a hybrid logic circuit structure reduces the number of transistors by approximately 41.7%.In comparison to the existing absolute value detector circuit structure [11], which has 35 effective logic gates, the proposed structure reduces the number of logic gates by approximately 42.9% and the minimum delay time by approximately 57.6%.In comparison with the structure in Yuan's study [8], the number of logic gates is reduced by 35% and the minimum delay is reduced by about 38.8%.It can be concluded that the absolute value detector circuit proposed in this study is able to reduce the delay relatively by a large amount.

Conclusion
This research designs a 5.94FO4 absolute value detector with ultra-low delay based on a hybrid logic structure, which not only has a lower delay than conventional CMOS structures but also requires fewer transistors to achieve the same logic operation function and results in a smaller chip size.However, the minimum delay in this paper is only a theoretical value calculated using the logic effort principle, and more negative factors need to be taken into account in practice.Secondly, the output swing of the GDI structure may cause the signal to be weakened during transmission, which should also be considered in future studies.It is hoped that this study will provide new ideas for the development of neural signal acquisition systems.

Figure 4 .
Figure 4. Comparator circuit [2].A is the output of the absolute value conversion circuit, and B is the target threshold that can be set manually by the user. 1 (high) is output when the input value is higher than the target threshold, and 0 is output when the input value is less than or equal to the target threshold.A new structure is used for the XOR gate in this design in order to be able to reduce the delay as much as possible[12], which is shown in Figure5.

Figure 6 .
Figure 6.Critical path.The delay of a multistage network is equal to the sum of stage delays, where the D is measured in the unit of a no-self-loading inverter.The total path delay is:
The 2's complement circuit is designed in Chapter 4. Chapter 5 shows the absolute value circuit.And the comparator is presented in Chapter 6.The Logical Effort delay is calculated in Chapter 7. Chapter 8 gives the results of this research.The conclusion is discussed in Chapter 9.

Table 1 .
Number of MOSFET in Different technique.

Table 3 .
Logical effort and parasitic effort for each gate.