Low Power Memory System Design Using Power Gated SRAM Cell

Static Random-Access Memory (SRAM) is widely used in cache memory, microprocessors, general computing applications and electronic circuits involving ASIC, FPGA and CPLD. The most commonly used SRAM is the 6T SRAM. However, it incurs higher power consumption and degraded signal to noise margin (SNM) during write and read operations. To overcome these shortcomings, a single ended power gated 11T SRAM for low power operation is proposed. The power consumption reduction is achieved using power gating through virtual VSS (VVSS) signal and transmission gates. Due to the introduction of transmission gates, memory cells realize enhanced write margin characteristics as compared to existing technologies. The proposed cell realizes 33.33% lower power consumption and 50% improvement in read SNM as compared to existing SRAM technologies. To study the impact of technology scaling on our proposed design, the work is carried out in Cadence Virtuoso® tool using both 180nm CMOS technology and BPTM 32nm FinFET technology.


Introduction
Low power memory design has created a huge impact in SOC design. Since memory is being embedded in the chips, the power consumption needs to be reduced. Low power, high density and high performance have become the most important aspects of state-of-the-art design solutions for memory. The primary focus of this design is to improve area efficiency, reduce power consumption and enhance performance of SRAM cells which are embedded in the devices as an integral part due to various factors.
In lower technology nodes, leakage current of MOSFET devices increases due to second order effects. Since FinFET technology provides improved electrostatic control over the channel, the latter is preferred to planar MOSFET technology. The choice of lower technology nodes enables increased memory density as required in the present times and in order to minimize the power dissipation in the highdensity memory chips. However, methods of effectively designing electrically high-performance SRAM cell using FinFET technology has been presented [1] [2]. Furthermore, previous works on FinFET technology with 7T SRAM cell validated for performance despite temperature variation and for different FinFET technology nodes, with comparative study analysis was presented in [3]. The various circuits in the power gated SRAM memory are implemented and analysed in both 180nm CMOS technology node and BPTM 32nm FinFET technology models. The conventional 6T SRAM bit cell has two cross coupled inverters and two access transistors connected to data storage nodes [4]. The inverter pair forms a cross connected latch and provides the necessary information for read and write operation. True and complimentary values of the data are stored on the bit line and bitbar line, respectively. In general, read and write operations require charging and discharging of bitline and bitbar lines. The bit cells are accessed through the access transistors during read and write operations, by asserting the word-line connected to the gates of the access devices high. Care is to be taken while sizing the access transistors with respect to the storage transistor nodes for correct read and write operations of a 6T SRAM cell, and since the same access transistors are used for both read and write operations. For noise free read operation, the width of the NMOS pull down transistors should be greater than the NMOS access transistors and for write operation, on the other hand, the width of the NMOS access transistors should be greater than the PMOS pull-up transistors. Sizing of a conventional 6T SRAM cell shows improvement in the static noise margin (SNM) values, thus achieving better performance of the cell.
A dynamic column-based power supply 8T SRAM structure is discussed by comparing the stability of the 8T SRAM cell with the conventional 6T using N curve analysis [5]. The static voltage noise margin (SVNM) resulted in twice as good result as the conventional 6T SRAM cell. Moreover, the write trip voltage (WTV) was found to be improved as compared to conventional 6T. The differential 10T SRAM implemented in [6] has large parasitic capacitances due to presence of bitline and bitbar lines. Due to the parasitic capacitances, the power consumption is large. Power can be reduced by reducing switching activity. In single ended SRAMs, the switching activity reduces by half, which results in less power consumption when compared with 6T, 8T and 10T SRAM cells [6].
Sense amplifier is required for detection of bitline voltage during read operation. Due to the presence of single bitline, single ended sense amplifier design is required [7]. The sensing power consumption power and delay of PG9T and PG11T. It also includes the simulation results of PG11T for write and read operation alongside the results of 4*4 SRAM array during write and read operation in cells 0 to 15. Section VI includes conclusion.

Operation of PG9T SRAM cell
A power gated 6-T SRAM structure has been developed along with various schemes of leakage power reduction, such as fine and coarse power gating techniques [9]. The power gating technique has also been implemented in a 9-T SRAM structure [10].
In PG9T SRAM cell, power gating transistors and Memory Data Read (MDR) transistor are used in addition to the cross coupled inverter pair and access transistor [10]. During write operation, the power gating transistors cut-off the power supply to the bitcell. Hence, the noise during write operation due to the power supply is reduced. During the read operation, the BL is isolated from the bitcell by activating the MDR transistor and deactivating the access buffer (AC2) transistors. Hence, the read current does not flow through the storage nodes, thus eliminating the noise during read operation. The main drawback in PG9T SRAM cells is that it uses NMOS access transistor, which has a weak ability to pass a logic 1. Hence, writing logic 1 in the cell becomes difficult [10].
This paper shows that transmission gates of PG11T SRAM cell are being used instead of access pass transistors to design a more energy-efficient logic 1. It further shows that power consumption is reduced as compared to PG9T. Figure 1 shows PG11T SRAM cell. The cell consists of a positive feedback latch made up of two crosscoupled inverters PU1, PD1, PU2 and PD2 with two power gated transistors PGN and PGP, transmission gates TG1 and TG2, and an MDR transistor.

Proposed PG11T SRAM cell
The transmission gates act like access buffer. During write operation, transmission gates TG1 and TG2 connect the single-ended BL to the internal storage node Q. Read operation is carried out by holding the value of BL or discharging through TG1 and MDR. Row-based signals involve WL, WLB, WLPU and WLPD, while BL, VVSS, WWL and WWLB form the column-based signals. During read operation,

Write operation.
During write operation, WL and WWL are made 1, and WLB and WWLB are made 0. WLPU is kept high and WLPD is kept low to turn off PGP and PGN, respectively. For writing a logic 1 in the cell, virtual ground VVSS is made high and for writing 0, VVSS is made low. The BL value gets into the SRAM cell and write operation is performed in the SRAM cell.

Hold operation.
Hold operation is performed by making WL, WWL as 0, and WLB and WWLB as 1, the power gated transistors PGP and PGN are turned on. VVSS is connected to 1. Then, the content of cell is retained.

Read operation.
For read operation, the WL and WLB are activated by 1 and 0, respectively, and WWL and WWLB are deactivated by 0 and 1, respectively. The virtual ground VVSS is kept at logic 0. Prior to the read operation, BL is pre-charged to Vdd. Whenever bit 1 is stored in the bitcell, MDR transistor turns ON. The pre-charged bitline discharges through the transmission gate TG1 and MDR. for storing bit 1 in the SRAM cell before read operation. Whenever bit 0 is stored in the bit cell, MDR remains off and the bitline does not attempt discharging. Hence, the complementary outputs are obtained during read operation.

Design of pre-charge circuit
The pre-charge circuit is used to speed up the read operation. Figure 2 illustrates the MOSFET-based pre-charge circuit. It uses two PMOS transistors (P1, P2) connected in parallel to drive the bitline high only when the pre-charge activation signal PC is low.  Figure 3 shows the write circuit used for writing onto the cells. It consists of a buffer followed by NMOS N1 transistor to drive the data to bitline when Write Enable (WE) signal is activated. The buffer is used to prevent degradation of Data signal.

Design of sense amplifier
A sense amplifier from [8] has been used in this design with a few modifications. The pull up PMOS transistor has been removed since a separate circuit is used for pre-charging. Furthermore, a footer NMOS N2 is added to prevent the connection to ground unless both the bitline and pre-charge are high. Pre-charging occurs with the PC low, and evaluation occurs with the PC high. The circuit shown in Figure 4 shows the sense amplifier. When PC is high and P2 transistor is turned on, P1 and N1 are operational as inverters with bitline as input. As a result, the complementary bitline is obtained as the output.

Design of 4x4 single ended array
The 4x4 array depicted consists of the structure of a PG11T SRAM cell array along with the simple peripheral circuits required. The peripheral circuits include decoder, multiplexer, pre-charge, sense amplifier and write circuits. In the array, the signals WL, WLB, WLPU and WLPD are row-based signals having BL, WWL, WWLB and VVSS as column-based signals. Two 2:4 decoders are instantiated for cell selection. Four-bit address A3, A2, A1, A0 is used for address referencing. The two MSB bits A3, A2 are used for row signal selection through the 2:4 row decoder and for column signal selection, two LSB bits A1, A0 are involved using 2:4 column decoder. A control signal, read (RD) is used which indicates read operation when high and write operation when low. The signals for read and write are switched by a 2:1 MUX. Figure 5 depicts a 4x4 array with the periphery circuits.

Design of 5:32 decoder
The 5:32 static decoder is implemented as in figure 6. It comprises of a pre-decoder and inverter circuits, which are being replicated in addition to alternate NAND and NOR gate structures. The pre-decoder circuit is used for realizing reduction in the gate count along with smaller number of input wires. It further reduces the number of stages from input to output with reduced delay and power consumption [11].

Design of 32x27 single ended SRAM array
The single ended SRAM cell is instantiated in the schematic to form an array similar to the 4x4 array. It can store the information in about 864-bit cells. The column-based and row-based signals are used as mentioned in 4x4 Single Ended array. Figure 7 shows a 32x27 SRAM array. The 5:32 decoder in figure  6 is used for address decoding in the 32x27 array.    Table 1 shows the power consumption analysis for read and write operation using logic 1 and logic 0. It can be observed that power dissipation in PG11T is less as compared to PG9T. Table 2. shows the delay comparison of PG9T and PG11T between the values obtained while using 180 and 32 nm technology files. Figure 8 shows the power consumption values of PG9T SRAM cell while working in 180nm CMOS technology. As can be seen, during 10ns to 20ns, write operations take place, hold operation occurs between the time instances of 20ns to 30ns, and the period between 30ns and 45ns includes read and pre-charge operations. Logic 1 operations are included in this cycle. The same cycle repeats for logic 0 during 55ns to 90ns time period. Figure 9 indicates the power consumption of PG11T SRAM cell while using 180nm CMOS technology during read and write operations. The read and write cycle is similar to PG9T SRAM cell. In 32nm FinFET technology, during read and write operations, figure 10 and figure  11 displays the power consumption of PG9T and PG11T SRAM cells, respectively. PG11T is simulated in transient mode during read and write operation, as shown in figure 12. Initially a logic 1 is written into the cell and it is read. Then a logic 0 is written and read subsequently. Figure 13 indicates the transient simulation of 4x4 array during write operation. The cells are selected using the row, column decoders and multiplexers. Once the cells are selected, data is written into the cells. Logic 1 is written into cells 0 to 15. Figure 14 shows the transient simulation of 4x4 array during the read operation with sense amplifier outputs. Before the cells are selected, all the bit lines are pre-charged and the cells which need to be read are selected using row, column decoders, multiplexers, and sense amplifiers. The logic 1 stored during the write operation in cells 0 to 15 are read. This is confirmed by the bit line discharge. The output of the sense amplifier is same as the data stored in the cells.

Results and Discussions
From figure 8 to figure 11, it is evident that power consumption during read 1 is higher than during read 0. This is because before the read operation, the bitline BL is pre-charged. During read 1, logic 1 is stored in the cell, which turns on the MDR transistor, creating a discharge path of the pre-charged BL to ground. During read 0 operation, when logic 1 is stored in the cell, the MDR turns off, thus making sure that no discharge path exists for the line BL, so that BL only has to retain the charge it had stored previously on its nodal capacitance. Therefore, read 0 consumes less power than read 1. Spikes are found to be present during the transient analysis, when switching of signals happen, which indicates the fact the dynamic power dissipation happens due to the switching.           in table 3. The result is obtained by taking the largest square drawn in the two lobes of the butterfly and choosing the minimum of the two values. Then, the diagonal of the corresponding square is taken as the SNM value. The SNM value is then determined by the diagonal of the corresponding square.
It can be observed that the average power results of the proposed SRAM structure realize enhanced characteristics while using 32 nm FinFET technology rather than the 180nm planar MOSFET technology. As lower technology devices require lower voltage supply, they result in reduced power dissipation. Furthermore, the structural configuration of the FinFET with the gate surrounding the channel on the three sides realizes greater control over the channel. Thus, FinFET can drive greater amount of current. The delay comparison results of table 2 shows that the delay is lesser in 32 nm FinFET technology, which is because the FinFET devices have faster switching speed and greater drive strengths. Thus, in 32nm technology the power and delay results are even better when compared to design using 180nm technology.

Conclusion
A single ended SRAM cell has been implemented which incurs 33.33% lesser power consumption even while displaying reduced read and write delay values as compared to PG9T. The transient simulations have been carried out for SRAM cell in 180nm CMOS technology and BPTM 32nm FinFET in Cadence