Integration of ferroelectric devices for advanced in-memory computing concepts

In this work the integration of ferroelectric (FE) devices for advanced in-memory computing applications is demonstrated based on the FeMFET memory cell concept. In contrast to FeFET having the FE layer directly embedded in the gate-stack, the FeMFET consists of a separated ferroelectric capacitor which can be integrated in the chip-interconnect layers. Optimization of the FE material stack under such lower thermal budget constraints will be discussed as well as the significant performance improvement and reduction of variability by application of superlattice FE-stacks and further optimization knobs. The low memory state variability is important for accurate multiply-accumulate (MAC) operation. Such improvements are demonstrated on a memory array test chip including functional verification of MAC operation along a FeMFET-based array column with good accuracy over high dynamic current range.


Introduction
Data storage is an integral part of today's microelectronic systems. 1,2)A wide range of memory concepts are available, ranging from volatile, very fast random access memories with low storage density to non-volatile flash memories with high storage density, large data volumes on the cost of long access times and higher write voltages.However, new application areas such as intelligent and energy-saving sensors for Internet of Things applications require new concepts that combine high speed, nonvolatility and energy efficiency together with good reliability. 2)In addition, such future edge applications require fast and power efficient computing within new memory architectures.Non-volatile data storage (NVM) based on ferroelectric (FE) hafnium zirconium oxide (HZO) is a promising storage technology to meet these requirements.In FE memories data is stored in the two remanent polarization states (Pr+, Pr−), which arise due to a displacement of ions in an FE crystal lattice. 3)Near the coercive field |Ec| the polarization changes between these two stable states, which can be read out by electrical measurements (see Fig. 1).Substantial efforts is made to enhance important figures of merits of ferroelectric HfO 2 films.Common strategies involve exploring various dopant elements, 4,5) concentrations, 6) and film thicknesses. 7,8)everal device implementation options of the FE-memory material can be considered, such as ferroelectric FET (FeFET) 9,10) or ferroelectric random access memory (FRAM). 11)In case of FeFET, the FE-material is integrated in Front-End of Line (FEOL) process module directly in the Gate-stack of a transistor with the advantage of direct field effect of the FE material and therefore no destructive read operation.The destructive read operation and write back of data after reading is required for the FRAM because the stored information is sensed by evaluating switched charge of the FE storage element connected to the drain side of a selector device.The switched charge depends on the previously stored information and is higher in case the FE material needs to switch during read operation.On the other side, FRAM has the advantage of utilizing symmetrical metal-ferroelectric-metal (MFM) stack which brings along enhanced reliability in contrast to the FeFET that requires an asymmetric MFIS stack.For enhancing FeFET reliability it is also important to manage the depolarization fields and to optimize the electrostatic characteristic of the interfacial layer to the channel. 12)][15][16] In this device, the polarization state stored in the BEoL MFM structure modulates the surface potential or conductivity of the semiconductor channel.This, in turn, affects the threshold voltage of the transistor.During readout, the drain current is detected at low gate voltages that are well below the coercive field.The readout is therefore non-destructive. 3)Advantageous of FeMFET in contrast to FeFET is the ability to dimension FET and MFM area independently.This area ratio tuning allows better control of electrostatics for enhanced device reliability.][19][20][21] Here, it is important to keep memory cell and accumulation variability as low as possible in order to achieve a high level of classification accuracy.In addition to earlier presentation 22) this paper discusses FE superlattice stacks as a potential approach for reduced variability and demonstrates MAC functionality of a FeMFET-based test-array emphasizing the capability of the FeMFET memory concept for the application in CiM architectures.

Experimental methods
The MFM capacitors were integrated into the BeoL of the XFAB XT018 technology, a 180 nm BCD-on-SOI solution that supports automotive AEC-Q100 Grade 0 23) designs.This modular platform provides a diverse portfolio of voltage options and various automotive-qualified memory solutions.The FE HZO films were optimized to fulfill the BEoL requirements 6,[24][25][26] and stabilize a significant portion of the orthorhombic phase [27][28][29][30] before wafers could undergo looping with X-FAB.
Initially, X-FAB pre-processed 200 mm silicon wafers until the M4 metallization layer.Afterward, the wafers were transferred to the Fraunhofer IPMS cleanroom facility for FE stack deposition via atomic layer deposition (ALD) at 300 °C.The precursors HfCl 4 and ZrCl 4 , along with H 2 O as the oxidizing reactant and N 2 as the purging gas, were utilized.The wafers were returned to X-FAB's cleanroom facility to carry out metallization layer M5 deposition, patterning, and completing all remaining BEoL fabrication processes.
Two integration approaches and stacking options were examined for the MFM module [Fig.2(b)].The first approach utilized MFM capacitors with standard top (TE) and bottom electrode (BE) of XT018 technology along with a 10 nm Hf 0.5 Zr 0.5 O 2 standard FE film.No dedicated annealing step was employed to crystallize the HZO in this approach.Instead, the standard thermal budget of BEoL processing, which lasts about 2 h at a maximum temperature of 400 °C, was used to trigger film crystallization in the orthorhombic phase.
In addition, MFM capacitors with optimized TiN BE and TE and between a 10 nm FE superlattice stack (SL) consisting of [1 nm HfO 2 /1 nm ZrO 2 ] × 5 were employed.The FE stack was crystallized in the orthorhombic phase by a one-hour furnace anneal at 400 °C, followed by the regular thermal budget of the remaining BEoL processing steps.Prior research on coupons has shown that implementing SL structures can enhance the remanent polarization, 31) endurance, 32) and improve the reliability under temperature and bias stress. 33)he 1T1C FeFET memory cell concept was implemented as an 8 kbit NOR-based memory test array with each 256 source lines and BL pairs and 32 word lines (WLs) [Fig.3(a)], including peripheral control circuitry. 15)In the test array 1T1C FeFET memory cells were designed with an area ratio A MFM /A FET of about 0.15.For better readability in the following discussion of BLs, the corresponding source lines will not be directly mentioned.64 bit parallel write and read is accomplished by 64 integrated sense-amplifier and pattern driver units, respectively.Operation of the test chip is performed by an external mixed-signal test system with static voltages for driving inhibit and select function on BL and WL.Digital sequence execution for program, erase and read operation is conducted via standard serial peripheral interface (SPI).Further test chip features with additional decoders allow direct access to BL, hence bypassing the sense amplifiers.In this mode the BLs can be operated directly by the test system, allowing current accumulation tests on BL-level.The principle of multi-cell activation, multiplication and accumulation is shown in Fig. 3(b) based on the example of three memory cells along single BL column.In binary mode, the WL input X1-X3 is multiplied with the corresponding cell's state, either high V th (HVT) or low V th  (LVT).A cell in LVT state will only conduct current in case the input X is activated.For all other cases, the corresponding cell will not conduct current, like a AND-function.The current of all activated cells along BL sums up to an accumulated (added) current whose result is analog and can be transformed by an analog-to-digital converter (ADC) or SMU.With the ability to select multiple WLs at the same time the test chip supports such MAC operation.

Results and discussion
The MFM module was integrated between M4 and M5 into the BEoL of the XT018 technology [see Fig. 2(b)].The most promising polarization versus electric field hysteresis characteristics were observed for MFM capacitors featuring optimized TiN BE and TE that underwent a dedicated annealing treatment at 400 °C for one hour, in addition to accommodating the thermal budget of the BEoL fabrication steps.These devices have low imprint in the virgin state (not shown), small device-to-device variability, and high remanent polarization [see Fig. 4(a)].For MFM capacitors with the standard TE and BE of the XT018 technology that did not receive a dedicated FE annealing step, these parameters are degraded.
The phenomenon of the initially pinched hysteresis loop opening during electric field cycling is referred to as wake-up.As anticipated in the context of fluorite-structured FE materials, 34) wake-up characteristics are observed in all MFM capacitors.Nevertheless, the recorded number of wake-up cycles (∼10 4 ) is significantly lower than the demonstrated endurance of 10 6 field cycles.Although the incorporation of a SL stack enhances reliability factors such as leakage current and imprint, the endurance, as shown in Fig. 4(b) at an applied electric field of 3.5 MV cm −1 and a cycling frequency of 10 kHz, is only slightly improved and remains in the range of ∼2 × 10 6 field cycles.However, upon extrapolation of these conditions to the specified operational parameters, an anticipated endurance exceeding 10 10 field cycles is projected. 8)sed on the promising characterization results on single capacitors, the material comparison was continued on higher memory cell statistic utilizing the 8 kbit array test chip.Measurements were performed on a 64-bit page in order to compare the 10 nm HZO reference material with the improved 8 nm and 10 nm SL-based stack.
As the cycling test in Fig. 5 shows, the optimized SL stack significantly improves the memory window (MW) and reduces the device-to-device variability compared to the reference HZO stack.This is in good agreement with the performance observed for the MFM module (Fig. 4).At the coupon level, previous studies have indicated that SLs with sublayers of 1 nm thickness can enhance the remnant polarization 31,35) and improve reliability under bias and temperature stress. 33)These beneficial effects are attributed to the stabilization of an increased fraction of the FE phase 31,35) and the comb-like band structure, 33) respectively.This comb-like band structure arises from the different bandgap values of the HfO 2 sublayer (ranging from 5.3 to PROGRESS REVIEW 5.9 eV 36) ) and the ZrO 2 sublayer (ranging from 3.8 to 4.5 eV 37) ).The low variability, which is an important requirement for controlled gradual switching and conversion in compute-in-memory circuits, is mainly due to the SL concept and the specifically tailored annealing process, optimized to enhance the quality of the FE stack.However, as expected, all devices show wake-up characteristics.Further optimization of the stack and crystallization process is required to reduce or mitigate this undesired characteristic.The MW comparison of 8 nm and 10 nm SL stack shows the typical relationship for FeFET-based devices between dielectric thickness and MW.Thinner FE-stacks come along with lower switching voltage but also smaller MW.As can be also seen in Fig. 5, the 8 nm SL stack shows no increase in variability and still good separation between program and erase state which supports the utilization of smaller voltage margins and also low-power operability.
After wake-up, the SL stack shows stable endurance characteristics and continuously low variability for at least 10 5 program and erase cycles (see left side of Fig. 6).Note that more than 10 8 field cycles have been reported for the reference HZO stack of 10 nm thickness. 13)The same endurance is expected for the improved SL-based HZO stacks.
The right side of Fig. 6 shows the initial retention behavior of the SL stack which indicates slow but measurable degradation in contrast to stable retention on MFM single devices. 15)Further retention characterization is needed to better understand the floating node and trapping behavior of the FeMFET cell concept.
For investigating the accumulation capability 16 consecutive memory cells sharing the same BL were preconditioned in alternating order with binary low V th (LVT) and high V th (HVT) state, respectively.Single cell I d -V g characteristic [see Fig. 7(a)] shows low variability and good separation of the two states.For reproducible current accumulation of a high number of cells along BL, proper current load capability on BLs and periphery is required.Figure 7(b) proves that there is no significant deviation between the measured current superposition of all cells compared to the calculated summation of I d -V g data obtained from single characteristic.The deviation in lower gate voltage region is related to the limited BL current measurement accuracy in the subthreshold region.Consequently, this experimental proves the capability of this memory technology for distortion-less current accumulation within a dynamic range of more than one order of magnitude.050802-4 © 2024 The Author(s).Published on behalf of The Japan Society of Applied Physics by IOP Publishing Ltd In Fig. 8, the full operation of binary multiplication and accumulation is demonstrated based on the example of incremental linear activation of parallel WLs.The WL input is AND-combined with the alternating cell states along BL, the resulting current is accumulated and measured on BL bypass output.The current increase after every second additionally activated WL correlates to the alternating pattern in the memory cells, matching also relatively well with the calculation based on single cell measurement.This gives a precondition for good accuracy of a hardware-implemented accelerator with such MAC-based computation in memory capability.

Conclusions
In this work it was shown, that the FeMFET memory concept can be utilized for in-memory computing architectures.As one aspect it was discussed that low memory state variability and therefore also good classification accuracy can be achieved by memory stack and process optimization.Especially the superlattice stack in combination with thermal treatments and optimized electrodes showed significant improvement with respect to low switching variability and reproducible memory cell characteristic.These enhanced properties were shown on memory array-level.Finally, a multiply accumulate operation in memory was demonstrated with good reproducibility and low current deviation based on binary input activation and binary weight information.

Fig. 1 .
Fig. 1.Schematic representation of the polarization properties of FE HfO 2 layers.The displacement of ions in a FE HfO 2 crystal is shown on the left.The electrically measurable polarization and current characteristics as a function of the electric field are shown on the right.The energy required to switch between the polarization states is very low, as shown by the small peaks in the displacement currents (Reproduced with permission. 3)Copyright 2021, IEEE).

Fig. 2 .
Fig. 2. To realize a 1T1C FeMFET, MFM capacitors placed in the BEoL of a microchip can be connected to the gate contact of a standard logic device, shown in the schematic device cross section (a) adopted with permission from. 3)Physical analysis cross-section (b) with scanning electron micrograph (SEM) of the chip metallization planes with embedded MFM memory element.The close-up of the MFM stack was taken by transmission electron microscope (TEM).

Fig. 3 .Fig. 4 .
Fig. 3. Schematic of (a) the investigated 8 kbit FeMFET test array with 256 sourceline (BL)/bitline (BL) pairs and 32 word lines as well as peripheral circuitry and interfaces for operating the array with a test system.The BL/SL multiplexer (MUX) features external access to the main SLG/BLG for investigating cumulated current by external ADC or source-measure unit (SMU).The principle of such MAC operation (b) based on three word lines.(a)(b)

Fig. 5 .
Fig. 5. Memory array characterization result of a 64-bit page showing wake-up and switching behavior of 10 nm HZO reference stack compared to 8 nm and 10 nm SL stacks.

Fig. 6 .
Fig. 6.Memory array characterization results of 8 nm SL stack comparing endurance cycling with no degradation and low variability of up to 100 k, and initial retention characteristic.

Fig. 7 .
Fig. 7. (a) Single device drain current-voltage characteristic of 16 bitcells along a single BL in hexadecimal coding with each 8 cells initialized in high-V th (HVT) and low-V th (LVT) state, respectively.During the measurement of single cell all other cells sharing the same BL were kept inactive.(b) Measured accumulated current characteristic of all activated HVT and LVT cells along same BL as well as current of only activated HVT and LVT cells, respectively.The characteristic with symbols is calculated based on measured single devices in (a).

Fig. 8 . 5 ©
Fig. 8. Demonstration of MAC operation in memory with measured and calculated BL current depending on the number of up to 16 activated WLs that are incrementally set to "Read" condition.The table on top shows the corresponding number of activated cells in LVT and HVT state, respectively.