ASIC Implementation for Multiple Scan at Register Transfer Level

Design for testability (DFT) is introduced to reduce the complexity of testing integrated circuit and help improve fault coverage. However, conventional DFT with additional 2-to-1 multiplexer (MUX) might add an increasing amount of delay and area overhead. Therefore, register transfer level (RTL) scan design is introduced to overcome this problem. This paper aims to present the development of an ASIC with RTL scan design by using multiple scan paths instead of the conventional technique, which is gate-level (GL) scan insertion to insert scan cells. This method inserts multiple scan cells by utilizing the existing multiplexer and operational units in order to reduce area overhead and delay. Besides, multiple scan paths are implemented instead of a single scan chain to reduce the complexity of the testing process, where a single scan chain has long test application times due to the high number of clock cycles for the scan chain process. RTL design is also modified by adding an extra gate for multiple scan chain insertion. The graph is derived in order to analyse the connection between registers such that the multiple scan paths can be determined accordingly. Synopsys tool is used for synthesized, placement and routing and performing automatic test pattern generation while static timing analysis is used to verify the results of the setup slack time for ASIC design. The simulation result shows that multiple scan paths insertion at RTL level has an area overhead and slack time of about 3.79% and 0.44ns less compared to multiple scan paths insertion at GL for a finite impulse response circuit with comparable high fault coverage and small delay. A comparison of the performance metrics such as area, setup slack time, and fault coverage of the multiple scan chain is done between the RTL scan and GL scan.


Introduction
The advancement of technology has resulted in electronic devices becoming more compact and lighter.Nowadays, integrated circuits contain tens of millions of small gates that perform more complex functions than ever before.Most modern integrated circuit (ICs) use sequential circuits to construct finite state machines.A complex integrated circuit could take up to tens of thousands of clock cycles to propagate one value of primary input through the sequential logic to determine one output [1].Functionality and correctness are important for a company that designs and manufactures integrated circuits to ensure that the products they produce are of high quality.This make the development cycle are high in cost in time and effort for testing [2].In order to overcome this problem, design for testability (DFT) has been introduced for adding testability to hardware product design in order to make chip testing feasible [3].This will enable manufacturers to design and apply manufacturing tests to their products.
In gate-level (GL) scan insertion, the D flip-flop is replaced by a scannable flip-flop, which is composed of a 2-to-1 multiplexer (MUX) and a D flip-flop.However, the additional multiplexer may lead to increases in delay and area overhead.Furthermore, the full scan approach provides high fault coverage, in general, but it has a limitation in terms of test application time because all flip-flops must be chained together to create a single path [4].The scan time getting longer and has an adverse effect on IC manufacturing.Therefore, register-transfer level (RTL) scan insertion with multiple scan chains was implemented to have minimal overheads, and application-specific integrated circuit (ASIC) implementation is required to compare the performance in terms of area and delay with the GL scan insertion method to insert DFT.
As the circuit grows more complex, the number of flip-flops rises.Using a single scan chain might result in a longer testing time.In a modern design, the number of flip-flops will be in the range of 10 6 for 10 5 of gate design [5].Single scan chains can prolong the testing process.During scan testing, the power utilized is three times that of normal operation, which may result in a drop in circuit performance during testing [6].Multiple scan chain is the best way to reduce the scan chain length by dividing the flip-flop which may result in a decrease in the scan time and reduce the test power [7].When multiple scan chains are run concurrently during test mode, the longest scan chain's scan time becomes the design's scan time.
The RTL scan insertion makes all the registers scannable by altering the RTL description.An RTL circuit can be configured in two ways, by using existing MUXs or by using operational units.By using RTL scans, there is no need an additional MUX to construct scannable flip-flop due to the existing MUX being utilized to obtain low-area scan paths.In the structure-aware scan, four modules were proposed, including wire, MUX, operational unit, and logic cloud [6].In the case of the wire module, no modifications are required since it can be used in the scan path.An if-else statement that interpreted as a MUX in original code will need to be modified to allow data to pass from input pin to output pin using the existing MUX in scan mode.For the operational unit, the code need to be modified so that the right input (or side-input) is forced to be zeros for additional operation or ones for multiplication operation in scan mode, so the data from the scan-in (SI) pin can propagate through the operational unit to the scan-out (SO) pin.
In this paper, multiple scan paths insertion is implemented at RTL level by utilizing the existing MUXs and operational units.Multiple scan paths are constructed by arranging the sequential parts of a circuit into scan paths using MUXs, operational units and circuit graph.RTL scan insertion optimize both the scan logic and the original logic prior to logic synthesis by modifying RTL statements in a way that makes every register scannable.The provisioning of scan chains at the functional RTL reduces the area overhead without compromising the fault coverage.Furthermore, this method tends to eliminate the delay caused by additional MUX in scan flip-flop for ASIC implementation.

Multiple Scan Paths
Designing with a single scan chain can lead long testing process.Furthermore, during scan tests, the power consumed is three times the power consumed during operational operation, which may result in a decrease in the performance of a circuit [6].Thus, one of the solutions to reduce the scan chain length is to divide the flip-flop into multiple chain chains.This will reduce the scan time and reduce the test power [7].When multiple scan chains are run concurrently during test mode, the longest scan chain's scan time becomes the design's scan time.Figure 1(a) and Figure 1(b) illustrate single and multiple scan chains, respectively.As an example, in Figure 1(a), the scan time is 15 clock cycles.As shown in Figure 1(b), the single chain is separated into four parallel scan chains, thus, the scan chain is reduced to only four clock cycles.
An RTL scan design makes every register scannable by altering the RTL description [8].An RTL circuit can be configured in two ways, by using existing multiplexers, and the other utilizing operational units.The multiplexer and operational unit can be utilized by adding additional gates in order to control the selection signal of the multiplexer and to force the side-input of the operational unit to propagate to the output.To illustrate, the input zeros must be applied to the side-input of the adder and the input ones must be applied to the side-input of the multiplier.The purpose of the RTL scan design is to eliminate the time delay associated with the additional multiplexer while obtaining low-area scan paths.

Methodology
The procedures in Figure 2 illustrates processes required for ASIC implementation for multiple scan insertion, from RTL design to physical implementation.The project did comparison for multiple scan insertion at RTL and multiple scan insertion at GL as shown in Figure 2(a) and Figure 2(b), respectively.
In GL scan insertion, the DFT is inserted after RTL code has been synthesized into GL netlist.In RTL scan insertion, DFT is inserted at the RTL level.Initially, this project focused on designing Verilog codes for original circuits and modified circuits with multiple scan insertion.Original circuit means no modification for the circuit.Verilog code is used for input to Design Compiler, which produces netlist files.For the original circuit, scan cells will be inserted during the synthesizing process.Then, both netlist files of original and modified circuit will be read by TetraMAX tools to generate test patterns and fault coverage.Similarly, the netlist file will be used to perform physical layouts, such as floorplans, placements, clock tree synthesis and routing.A static timing analysis can also be done using Prime Time in order to obtain timing paths.The performance will then be recorded and analysed in terms of area overhead, setup time, and fault coverage.

Circuit Graph
Figure 3 depicts a circuit diagram for finite impulse response (FIR) circuit.A circuit graph is composed of vertex, directed edge, and edge-weight [9].A vertex represents a primary input-port, primary outputport, or the output-port of an RTL module, a directed edge represents the connection between vertices, and an edge-weight reflects the additional area created by scan insertion [10].

Multiple Scan Paths Determination
To identify multiple scan paths, the circuit graph needs to be implemented corresponding to the RTL circuit.Scan path segments are created to utilize MUXs and operational units that correspond to wires and operational units with MUX [10].In Figure 3, two scan paths segments are identified.Scan path segment_1 is x_data → xn → xn_1 → xn_2 → xn_3 and reg_y → y.Each scan path segment can be extended by connecting the scan path segments since scan path segment_1 has primary input and scan path segment_2 has primary output.Therefore, the scan path segment_1 for this circuit is x_data → xn → xn_1 → xn_2 → xn_3 → reg_y → y.
After performing scan path segment_1, it shows that register w3, w2, w1, and w0 become floating registers.Since the data path circuit has only one primary output, the additional primary output SO is necessary for this scan path segment_2.These registers do not have scan connection.Therefore, it must be implemented by using additional MUX to create a scan path, specifically w_coeff → w3 → w2 → w1 → w0 → scanout.Therefore, the scan path segment_2 for this circuit is w_coeff → w3 → w2 → w1 → w0 → scanout.
Figure 4 shows the modified circuit in RTL after inserting multiple scan paths by utilizing existing MUXs and operational units.Multiple scan paths consist of scan path segment_1 and scan path segment_2.The bold lines show the addition line in order to implement multiple scan paths.Figure 5 shows the schematic view after multiple scan paths insertion.

Results and Discussion
Two circuits, finite impulse response (FIR) and infinite impulse response (IIR) are applied as the circuit under test (CUT) to demonstrate the effectiveness in reducing the area overhead and delay of the proposed multiple scan paths at RTL.The FIR circuit represents an open loop circuit, whereas the IIR circuit represents a closed loop circuit.DFT compiler from Synopsys is used for GL scan insertion.ASIC design is implemented in 32nm complementary metal-oxide semiconductor (CMOS) technology.The RTL scan design is compared to the GL scan design, in which the number of scan paths inserted by the DFT compiler is the same as the number of bits in the SI pin.For example, the FIR circuit has two input SI pins with 8-bits each, therefore, the number of scan paths is sixteen.Table 1 illustrates the characteristics of RTL circuits in terms of number of register.The performance metrics of the ASIC are assessed before and after multiple scan paths insertion are implemented at RTL and GL.First, the effect in terms of area is evaluated by comparing RTL scan design with GL scan design.As shown in Table 2, before scan insertion, the area of FIR ASIC is 3219.67 2.After insertion, the area for the GL scan design circuit is 3506.592 while the RTL scan design circuit is 3341.72 2.The percentage of RTL is 5.12% while GL scan design is 8.91%.Therefore, RTL scan design reduces about 3.79% of area overhead compared to GL insertion.The percentage of area overhead is determined by calculating an extra logics addition over the original logics.RTL scan design can reduce area overhead instead of GL scan insertion because the it able to utilize existing MUXs and operational units while in GL scan insertion, every flip-flop needs to be added another MUX where it can increase the area overhead.The area overhead for IIR circuit at RTL is also less than at GL.  3 shows the setup slack for the FIR and IIR circuit.Setup slack has defined as the difference between data required time and data arrival time [11].This parameter is evaluated to see the effect of the delay from scan insertion.For the FIR ASIC, the original setup slack is 3.25 ns.After scan insertion, the slack for the RTL scan design circuit is 3.23 ns compared to the GL scan design circuit is 2.79 ns.
The impact of scan insertion on delay, RTL scan design has more slack, which means it has a larger margin for data arrival and can achieve high frequency as long as the slack time is not violated.Next, fault coverage is evaluated to ensure RTL scan design can increase controllability and observability for the circuit.Table 4 shows the RTL scan design capable to achieve fault coverage above 95% for the respective design.Therefore, RTL scan design can improve fault coverage while meeting the setup slack and obtaining lower area overhead.

Conclusion
In conclusion, this research is able to implement scan insertion at RTL by obtaining lower area overhead and smaller delay compared to GL scan.During the process of scan insertion implementation, the number of scan paths, SI and SO pins are the same.The method utilizes existing MUX and operational units of the original circuit by a logic synthesis tool.By identified the scan path and modified the RTL design, the multiple scan has been implemented into the circuit.The comparison is made and shows that the results for multiple scan paths insertion at RTL can improve fault coverage, setup slack and also chip area compared to GL insertion.RTL scan insertion is effectively reducing area overhead compared with GL scan design.It shows that this method of multiple scan paths insertion at RTL has advantages in term of chip area, setup slack and fault coverage for ASIC implementation.For future work, the partial scan is implemented since it need less MUX based on selected flip-flop and assume area overhead can be reduced while obtain high fault coverage.

Figure 3 .
Figure 3. Circuit graph for data path circuit.

Figure 4 .
Figure 4. RTL scan design with multiple scan paths.

Figure 5 .
Figure 5. Schematic view after multiple scan paths insertion.

Figure 6
Figure 6 and 7 depict the physical layout of the FIR and IIR for both scan insertion techniques, respectively.Core utilization has been set at 0.8 so that the timing constraints are met after placement as well as routing while avoiding congestion.Both physical layouts are pad restricted, where the die size defined by the count of input/output (I/O) ports.The IC Compiler graphic user interface (GUI) displays the routing of the circuit since the FIR circuit has less standard cells and I/O ports than the IIR circuit.The routing of the IIR circuit can be viewed by zooming into the layout.

Figure 6 .
Figure 6.Physical layout RTL scan for FIR.Figure 7. Physical layout RTL scan for IIR.

Figure 7 .
Figure 6.Physical layout RTL scan for FIR.Figure 7. Physical layout RTL scan for IIR.

Table 1 .
Characteristic of RTL circuit.

Table 2 .
Area overhead result for RTL scan design and GL scan design.

Table 3 .
Setup time result for RTL scan design and GL scan design.

Table 4 .
Fault coverage result for RTL scan design and GL scan design.