Design and implementation of a fast sequential multiplier based on iterative addition architecture

In this paper, a fast design and implementation for sequential multiplier is presented. The suggested approach of implementation incorporates a definition of iterative addition that reduces the number of additions required in calculating the product of two binary numbers. The proposed implementation of sequential multiplier eliminates all shift operations required by conventional sequential multiplier to only one shift operation with the final accumulated result. Proposed and conventional designs of sequential multiplier are simulated in Quartus II synthesis software tool using Verilog implementation. According to the simulation results, the proposed implementation of sequential multiplier is better than conventional implementation in terms of delay time and power consumption. The proposed sequential multiplier shows an average improvement of 17.15% in delay time compared to conventional sequential multiplier.


Introduction
Multipliers are one of the basic units used in implementing different simple and complicated digital circuits. Digital multipliers are the heart of many devices and applications used in our daily life. Since multipliers consume large area and power of implementation, optimization in their design can play a key role in optimizing the speed and area of digital circuits such as Digital Signal Processing DSP digital communication systems and any other circuit use multiplier in its structure [1,2]. Multipliers occupied the core of different operations such as convolution, cross correlation, and filter implementation which are mainly used in DSP processes and applications [3,4]. Different algorithms have been used in implementing the circuit of multiplication. Some of the algorithms simulate the process of doing the multiplication by hand and other uses special algorithms to implement the process of multiplication.
The hardware implementation of digital multiplier can be classified in to combinational multiplier design and sequential multiplier design. Combinational multipliers are the direct and basic version of multiplication. Most of combinational multipliers such as Array multipliers, Wallace Tree Multiplier, and Booth multipliers mimic the basic definition of multiplication in which a number of add and shift operations to find the final product. Sequential multipliers are basically using a single circuit of addition to accumulate the product of multiplication. Sequential multipliers are smaller in area than combinational multipliers.
The aim of this research is to provide a fast and low power design for sequential multiplier based on a new look to the basic definition of multiplication. The rest of the paper is organized as follows. Section 2 presents some of the standard and basic designs of multipliers. The proposed IOP Publishing doi:10.1088/1757-899X/1076/1/012040 2 implementation of sequential multiplier is described in Section 3. Finally, in section 4 the simulation results are discussed followed by conclusions in section 5.

Existing digital multipliers
In this part some of the standard and basic designs of multipliers are presented. Multipliers such as basic array multiplier, wallace tree multiplier, booth multiplier, and conventional sequential multiplier could be considered as the base technique in developing the implementation of digital multipliers.

Basic Array multiplier
Array multipliers employ an array of adders to represent the direct implementation of manual multiplication. Figure 1 shows the general implementation for 4-bit basic array multiplier. A straightforward layout can be easily generated to represent array multipliers for hardware implementation in which the basic add and shift algorithm is used in the design. This type of multipliers is easy in implementation, but requires a big area of design that increases proportionally with the increase in size of the multiplied operands.

Wallace Tree Multiplier
Wallace Tree Multiplier is a tree based implementation for multiplication operation. It uses carry save addition CSA to reduce the delay time of multiplication. In tree multiplier, the multiplication is done with three stages. It starts with bit products calculation, then the minimization of bit products rows into two rows through the use of CSA, and end with calculating the final result of multiplication by adding the two bit products rows generated with CSA [2, 5].

Booth multiplier
Booth algorithm is one of the famous approaches used in the design of a reduced area, high speed, and low power design for multipliers. Three units are used in booth multiplier to complete the process of multiplication, which are: the decoder unit, the unit for generating partial product, and adder unit. Although the use of booth algorithm participates in enhancing the multiplier design in term of speed, a large number of partial products are required to complete the process of multiplication [6 -9].

Conventional sequential multiplier
Although the combinational multipliers have an easy structure of implementation and take less time to complete the process of multiplication, their area of design increases with the increasing in size of multiplied numbers. Sequential multipliers are the implementation option to have a reduced area design for multiplier.
Conventional version of sequential multiplier is implemented through the generating of partial products of multiplication with sequenced steps in a way mimics the manual implementation of multiplication. Sequential steps are used to do an accumulative addition to partial products generated by shift left (with the multiplicand) and shift right (with the multiplier) [10]. Figure 2 shows the general datapath structure for 8-bit sequential multiplier in its conventional implementation [11]. Datapath structure for 8-bit sequential multiplier (conventional implementation) [11].
A controller unit is used to generate all signals used by sub-modules of dtapath unit. Even though the implementation of sequential multiplier reduces the area of design by avoiding the use arrays of adders, but the process of multiplication is still slow. The time consuming process in conventional version of sequential multiplier is related with the number of sequential shift operations required to find the product of two numbers.

Proposed sequential multiplier
The idea behind the suggested implementation of binary multiplier is using the very basic definition of multiplication. The multiplication can be basically defined as a repetitive addition in which one of the two multiplied numbers (multiplicand) will be added to itself a number of times equal to the value of the second number (multiplier). Using the definition as it requires a number of additions equal to the value of the multiplier and that is a time consuming operation. The number of repetitive additions could be reduced to half by using the following proposed equations:  The proposed multiplier is designed as a sequential multiplier according to the hardware architecture shown in Figure 4 below.  State 0 (S0): It is the state at which accumulator register R is cleared and external data for X, and Y is loaded to registers X and Y when start signal is zero. The control unit continues working at state0 as long as start signal equal to zero and the movement to next state is done when start signal becomes one.
State 1 (S1): It is the state at which we are adding X to R, and decrementing Y [7:1] by one when 'Zero' signal is low. As long as 'Zero' signal is low, the process of addition and decrementing continue through setting of 'LoadR' and 'Dec' signals. When 'Zero' is high the content of R register is shift to left by setting 'Shiftl' signal. One more addition is done when 'Odd" signal is 1 through setting of 'LoadR' and the movement to next state appears.
State 2 (S2): this is the last state at which the multiplication is done by generating a control signal 'Done' and no change in this state is recorded as long as start equal to 1. When start equal to zero the control unit moves back to reset state to process a new operation of multiplication.

Simulation Results
The designs of 8-bit size multipliers (including the design of both proposed and conventional sequential multiplier) are synthesized with Verilog implementation of Quartus II synthesis software IOP Publishing doi:10.1088/1757-899X/1076/1/012040 7 tool. The performance analysis, including time delay, power dissipation, and number of logic elements is recorded with the use of the Altera FPGA EP2C5T144C6 device (Cyclone II). Structural design implementations of two components for conventional and proposed sequential multipliers were built using Quartus II Verilog HDL synthesis. A top-level component is built to instantiate all the lower level components of each multiplier. The designs were implemented and synthesized based on the structures explained previously in Figure 2 and Figure 4 (structure of conventional and proposed sequential multipliers respectively). Table 1 shows the comparison between the proposed and conventional sequential multipliers in term of delay time (worst case), power consumption, and number of FPGA blocks (logic elements) respectively.  Based on the simulation results, our proposed design shows better performance than the conventional sequential multiplier in terms of delay time and power consumption. Results of simulation show that the proposed design has a slightly larger number of logic elements, but the delay time, and power consumption are significantly reduced. Since the standard design of adder (i.e. Ripple Carry Adder RCA) is used in implementing the accumulator unit of the proposed multiplier, the area of the proposed multiplier is slightly bigger than the area of conventional multiplier. The classification of the proposed and conventional sequential multipliers in term of delay time, power consumption, and number of FPGA blocks is shown in Figure 6.

Conclusions
In this paper, a new approach is proposed for implementing a fast sequential multiplier. The new approach comes up with a new implementation for sequential multiplier that eliminates all shift operations required by conventional sequential multiplier to only one shift operation with the final accumulated result. Conventional and proposed designs of sequential multiplier are simulated in IOP Publishing doi:10.1088/1757-899X/1076/1/012040 8 Quartus II synthesis software tool using Verilog implementations. Results of simulation demonstrate that the proposed design of the sequential multiplier is faster than the conventional design. The proposed sequential multiplier is found to have a reduction in delay time of conventional design by 17.15%. As a future work reducing the total logic elements of the implementation could be achieved by applying more enhancements to the design of adder that occupied most of the area of the proposed multiplier.