Design and Implementation of Power-Efficient FSM based UART

The remarkable innovations in technology are driven mainly by the high-speed data communication requirements of the modern generation. The Universal Asynchronous Receiver Transmitter (UART) is one of the most sought-after communication protocols. This work mainly focuses on implementing and analysing the UART for data communication. The Finite State Machine (FSM) implements the baud rate generator, transmitter, and receiver modules. Cadence NCSIM was utilized for simulation, and Cadence RTL Compiler was used during synthesis using the 45 nm and 90 nm General Process Design Kit (GPDK) library files. The baud rate of 9600 bps and 50 MHz clock frequency was used to design UART. The increased speed and complexity of the VLSI chip designs has resulted in a significant increase in power consumption. The comparative analysis of power and delay for different clock periods shows an improvement in the total power and the Power Delay Product (PDP) with increasing clock periods. Better results were observed using 45 nm in comparison to the 90 nm library.


1.
Introduction UART, a stand-alone Integrated Circuit (IC), is employed for data communication serially over computers or secondary devices for long-distance communication in digital circuits due to its high reliability [9]. There are various modes of data transmission in UART, of which the full-duplex method is one in which the transmitter and the receiver modules work simultaneously [1]. Parallel and serial data transmission can be used to transmit digital data from one device to another. UART consists of a Baud Rate Generator (BRG), transmitter, and receiver. BRG determines the speed of asynchronous communication. The baud rate defines the speed of transferring data from the transmitter to the receiver in bits per second [9]. The timing parameters have to be satisfied by the transmitter and receiver before deciding the data transfer rate. The conversion from parallel to serial is carried out using a shift register. The receiver converts the data in serial to parallel format at the receiver end, which requires a Serial In Parallel Out (SIPO) shift register.
The transmitter receives 8-bit parallel data from the device and stores it in a register. The transmission of data starts with creating the data frame by the inclusion of a START bit at the beginning, after which the PARITY bits are included followed by the STOP bit. This enables the receiver to understand when to start and stop reading the bits. The data frame consists of 11 bits. After detecting the stop bit, the receiver goes to logic one state and starts looking for the next bits. Then the receiver does the serial to parallel conversion and sends it to the receiver end [1][2]. Figure 1 shows the UART data frame structure.

2.
Literature Review Many researchers have designed UART for serial communication by using various strategies like algorithms and logical relations. The authors have described the implementation of a UART in [1] using Finite State Machine (FSM). This architecture transmitter block comprises a generator for baud rate, parity, transmitter, and shift registers. Similarly, a generator for baud rate, negative edge detector, parity checker, receiver, and shift register are all included in the receiver with a 4 Mbps baud rate. Xilinx Vivado 2016 was used for simulation and synthesis.
UART design in [2] was implemented using an algorithmic state machine chart using a baud rate of 9600 bps which was used to communicate short-distance serial communication with lesser speed. The time taken to move the information was reduced because of an increased baud rate. The authors in [3] implemented the data communication using clock periods above 3 ns on Kintex-7 FPGA. The design was analysed at various clock periods. It was found that for lower clocks, data transfer was impractical. Power analysis was performed using Vertex-6, Spartan 3, and Spartan 6 FPGAs in [4] for various frequencies from 10MHz to 100MHz. It was concluded that the power dissipation was less at low frequency, and at high frequency, the power was high.
In [5], the frequency scaling and thermal aware control unit were designed using 28 nm technology, and it was observed that thermal properties increased with increased frequency. The frequency variations were from MHz to GHz and analysed the direct relation between the power and frequency. The developed UART design [8] used a unique frame format that involved a microcontroller/microprocessor to manage the operation of the UART to achieve power saving. The developed UART was designed using Verilog HDL. The work in [9] discussed numerous synchronization problems that arose while designing the UART receiver. There existed a highest tolerable clock frequency deviation end. Various mathematical derivations were discussed to gauge and adjust the reception. Thus, the design concluded that growing oversampling ratio improved the accuracy of data reception by adding parity bit to the data frame.
Using Spartan 6, Spartan 3, and Virtex 4 FPGA boards, Keshav Kumar et al. compared drain, quiescent and overall power in [10]. When compared to other FPGA boards, the Spartan 6 FPGA was found to require less power during operation. In [11] the simulation of UART design was done using Xilinx ISE Design 14.1, and the author had described the thermal properties role when the amount of power consumption was reduced. Virtex-4, Virtex-5, Virtex-6 FPGA's were used with 90, 65, 40 nm technology libraries, respectively. When the different nanometer technologies were compared, it was observed that 90 nm based Virtex-4 FPGA used comparatively lesser power. The behavior of UART design with different I/O Standards was observed in [12]. For the minimal loss of power, an ideal UART design was used. It was found that LVCMOS18 was effective for power, and the comparison of inputoutput power was made using two different capacitances.
The previous work lacks the analysis of power and delay, an essential criterion from the low power perspective. Analysing the proposed design regarding power and delay plays a significant role in

3.
Implementation of UART Design The UART design consists of a BRG, transmitter and receiver module as shown in Figure 2. The BRG acts as a frequency divider circuit. The baud clock of 50 MHz is taken with a baud rate of 9600 bps. The clock output from BRG is sent to the transmitter and receiver. The data in the transmitter is taken in parallel and given out serially. The data is taken in serially by the receiver and given out parallelly as the output. The factor of baud rate=16 (BCLK16) and Divisor=D.

Implementation of UART Transmitter
The data in the parallel form is converted to serial using the UART transmitter module, after which it is then sent to the receiver module as shown in Figure 3. This conversion takes place using the Finite State Machine (FSM), which contains the elements used in the data frame.
In the idle state, the transmitter is halted until a logic high is given to the start signal. The input is fed in parallel to the registers when the start bit is one. After completing the LOAD operation, the next state is DATA when COUNT =15 for the shift operation as it converts the data from parallel to serial using the PISO shift register. After the shift operation, the OUT becomes 1. The parity generator generates

Implementation of UART Receiver
The data conversion from serial to parallel is obtained using the UART Receiver module as shown in Figure 4. This conversion takes place using the FSM. The receiver module is in the IDLE STATE when continuous ones are detected. When logic is low, the next state DATA starts to receive the data when the DETECT_BIT=1 and the shift operation is done. If COUNT=7, then the next state is for parity. If RX_COUNT =15 then the parity check has been done successfully. When the receiver detects the stop bit, it moves to the ideal state, and all the data bits are sent parallelly in RX_DATA_OUT.

4.
Results This section analysed the UART transmitter and receiver simulation, power analysis, delay, and PDP of the UART design for 45 and 90 nm technology.

Simulation of UART Design
The UART Design with the baud rate of 9600 bps with 50 MHz frequency was simulated to verify the design as shown in Figure 5. An input 8'hAA was given to the transmitter, and the output, DOUT, was obtained at the receiver.

Power, Delay and PDP Calculation
A power reduction was examined as the clock period increased, as indicated in Tables 1 and 2. The PDP defined as the average power and gate delay product is given by equation 3. The PDP decreased with respect to the total power. A value of 0 ns should be obtained for hold and negative slack to ensure reliable data communication. The values of 0 ns were achieved using a clock period of more than 1 ns. The results indicate a considerable reduction in power using the 45 nm compared to the 90 nm cell library.   The graphical representation of the delay and power using 45 and 90 nm technology is as illustrated in Figures 6 and 7. The variations of the delay and PDP for different clock periods are observed. Based on the previous UART designs implemented, the power is compared in Table 3 and plotted in Figure 8. It is evident from the plot that there is a significant improvement in power in the proposed design in comparison to the existing designs.

Conclusions
This research work is about improvements in total power and PDP with respect to the change in the clock periods. UART implementation is done with FSM, and the simulation and synthesis are done on Cadence NCSIM Simulator and Compiler using 45 and 90 nm GPDK library files. The comparative analysis of total power consumption with different clock periods shows that the total power and the PDP decrease when the clock period increases. The data transmission is complex for a clock period of less than 1 ns due to the limitations in the time constraints. At an 8 ns clock period, the total power is 95.6 µw in 45 nm technology and 133.80 µw in 90 nm technology. The comparison of the power using 45 nm with 90 nm technology indicates that the 45 nm technology exhibits superior results. There was a significant improvement in power in the proposed design in comparison to the existing designs.