Advancing electron microscopy using deep learning

Electron microscopy, a sub-field of microanalysis, is critical to many fields of research. The widespread use of electron microscopy for imaging molecules and materials has had an enormous impact on our understanding of countless systems and has accelerated impacts in drug discovery and materials design, for electronic, energy, environment and health applications. With this success a bottleneck has emerged, as the rate at which we can collect data has significantly exceeded the rate at which we can analyze it. Fortunately, this has coincided with the rise of advanced computational methods, including data science and machine learning. Deep learning (DL), a sub-field of machine learning capable of learning from large quantities of data such as images, is ideally suited to overcome some of the challenges of electron microscopy at scale. There are a variety of different DL approaches relevant to the field, with unique advantages and disadvantages. In this review, we describe some well-established methods, with some recent examples, and introduce some new methods currently emerging in computer science. Our summary of DL is designed to guide electron microscopists to choose the right DL algorithm for their research and prepare for their digital future.


Introduction
Microanalysis refers to the examination of small samples or components of a material or substance in order to identify and quantify their chemical or physical properties.This type of analysis is typically carried out using advanced analytical techniques that allow for the detection and measurement of extremely small quantities of substances, often on the order of micrograms or even nanograms.These techniques include electron microscopy [1][2][3] which uses a focused beam of electrons to produce high-resolution images of a sample; x-ray diffraction [4] which uses diffraction patterns from scattered x-rays from a sample; Fourier transform infrared spectroscopy [5] which uses the vibrational modes of a sample; atomic force microscopy [6] that scans the surface of a sample; energy-dispersive x-ray spectroscopy [7] and auger electron spectroscopy [8] that detects x-rays and electrons (respectively) produced when a sample is bombarded with electrons; and Raman spectroscopy [9] which uses laser light to study the vibrational modes of a sample.These techniques can be used to identify the composition of materials, the presence of impurities or contaminants, and the distribution of various components within a sample.These techniques can be used to study the physical properties of materials, such as their crystal structure or their electronic and magnetic properties, and allow researchers to obtain detailed information which can be useful in a wide range of applications, including materials science, pharmaceuticals, and environmental science.
However, microanalysis suffers from a number of limitations.While many techniques offer a high spatial resolution, there are limits to how small features can be resolved [10][11][12][13], and this can be a particular challenge when working with complex materials or structures.Microanalysis techniques are often limited by their ability to detect low concentrations of substances [14], which can be a problem when trying to analyze trace amounts of impurities or contaminants in a sample.Arguably the most significant challenge to researchers in recent years, is that microanalysis techniques generate large amounts of data [15], which can be difficult to interpret and analyze without specialized software and expertise.
For example, the images (micrographs) generated by transmission electron microscopy, a widely used microanalysis technique, currently rely on manual labeling by experienced researchers [16], but manually labeling large numbers of micrographs is infeasible, and the need for skilled professionals continues to grow [17,18].Additionally, obtaining the correct configuration of transmission electron microscopy, which is related to quality parameters and research subjects, is still a complex process.According to Franken et al [1], brightness, coherence and stability are constituted of the quality parameters and all of them are related to the configuration of electron beams.However, the management of electron beams is also directly connected to radiation damage.For instance, an increase in acceleration voltage can lead to a decreased wavelength and it is less likely to cause radiation damage, and it can increase resolving power by decreasing contrast.On the opposite, a decrease in acceleration voltage can lead to increase wavelength and it is likely to cause radiation damage and it can decrease resolving power by increasing contrast [1].Egerton et al [19] also confirm that radiation damage is associated with electron dose.Egerton [20] describe that management of radiation damage is a manually complex process.Hence, reducing the need for manual labeling and automating the configuration of quality parameters used by transmission electron microscopy will have a big impact on the microanalysis industry.
Rather than developing new instrumentation and retraining more professionals, an alternative approach that has been receiving considerable attention is to combine microanalysis with advanced computational analysis.Deep learning (DL), a sub-field of machine learning utilizing multiple layers in artificial neural networks as its foundation, is particularly relevant and worthy of review.In essence, DL algorithms develop complex computational models consisting of multiple processing layers to learn representations of data that have multiple levels of abstraction.These techniques have significantly advanced the current abilities in fields such as visual object recognition, object detection [21], object segmentation [22], and object tracking [23] as well as seeking global optimum [24].
There are, however, a large number of DL algorithms, each with different capabilities, advantages and disadvantages; and the list of candidate methods grows every year in this rapidly progressing field.It is not immediately clear to researchers outside computer science which DL methods are suited to different problems, or where their limitations may be prohibitive.Knowledge about their architectures and applications is still important when considering selecting which one to use, or combining several of them into a coherent workflow, regardless of the problem in electron microscopy.Furthermore, training these systems is expensive, considering factors such as carbon footprint.Making an informed choice is key.Additionally, as the size of microscopy data sets increases, and computational resources become limited, differentiating between several types of neural networks which have similar functions becomes critical.
As we will discuss here, DL could provide a solution to some of the growing challenges in electron microscopy, and potentially other areas of microanalysis in the future, provided the right method is selected and applied.

Background
Electron microscopy has many different variants [25].Scanning electron microscopy (SEM) [26,27] is a powerful imaging technique that uses a focused beam of electrons to generate high-resolution images of the surface of a sample.The earliest paper which introduced the concept of SEM was by Knoll [28] and the first application of SEM was introduced in 1942 [29].SEM scans a beam of high-energy electrons over the surface of a sample, which causes secondary electrons to be emitted from the surface [30].These secondary electrons are then detected and used to form an image of the sample's surface [30].One of the main advantages of SEM over other imaging techniques is its ability to produce images with high resolution and depth of field, and its ability to detect small features and defects on the surface of a sample [30].This makes it ideal for examining the surface morphology and topography of a wide range of samples, including biological specimens, materials, and geological samples [30].SEM imaging is non-destructive [31], meaning that the sample can be imaged without altering or damaging its structure.SEM can also be used to analyze the chemical composition of a sample using techniques such as energy-dispersive x-ray spectroscopy or wavelength-dispersive x-ray spectroscopy [32].SEM can produce images in three dimensions [33].
Disadvantages of SEM include the need for careful sample preparation [34], which involves coating the sample with a conductive material, which can be time-consuming and technically challenging.SEM imaging must be performed in a vacuum, which can limit the types of samples that can be imaged and can affect the accuracy of the results, and SEM can only image samples that fit within the size limits of the microscope chamber, which can be a limitation for some types of samples.SEM imaging is prone to imaging artifacts [35], such as charging, beam damage, or poor image resolution, which can affect the accuracy and reliability of the results.
The application of DL to both TEM and SEM will be reviewed in this paper since they are the two major types in this field.We will conclude by recommending some areas worthy of future attention and some opportunities that emerge as the interface between these two fields that could be mutually beneficial.

Data processing
Data processing is an important technique when applying DL algorithms because the performance of the model is significantly affected by the quality of data.The history of data processing is extensive, and there are various stages and techniques such as data cleaning [104], data imputation [105], data transformation [106], data aggregation [107] and feature selection [108]; all of which impact model performance.
Several data processing techniques have been introduced into DL pipelines to mange data volume.Data augmentation is often used for a tasks involving image datasets, which is particularly relevant to microscopy.For example, data augmentation is employed by Krizhevsky et al [109] to decrease overfitting on image data, resulting in a model with better performance.Similarly, Rebuffi et al [110] use data augmentation to improve the robustness of their model by reducing the overfitting, and their results eventually surpasses the previous top model on the CIFAR-10 dataset [111].Feature selection is also used in many DL studies, and has been show to accelerate the performance of a scene classification task [112].Feature selection can contribute robust and optimal features for complex problems [113].
Some data processing techniques have been invented during DL, such as batch normalization which is an important technique that has only been used in this field.Ioffe and Szegedy [114] have shown that batch normalization can accelerate the training of deep networks, and Bjorck et al [115] also showed that Batch normalization can boost training speed and improve accuracy.Image resizing is another useful technique, and Sergio and Abdussalam [116] have shown that smaller images tend to facilitate faster training of computer vision algorithms.Similarly, Bisogni et al [117] report that image resizing can improve generalization and decrease overfitting.Moreover, Bals and Epple [84] resized their images obtained from STEM to fit their DL algorithm.As a result, the accuracy of their algorithm on a classification task reaches more than 90%.
Many of these data processing techniques have been directly applied to electron microscopy.For example, Fujishiro et al [118] employ data augmentation on training images generated by SEM for a convolutional neural network (CNNs).According to their results, their work obtains a high defect classification accuracy.Horwath et al [68] state that batch normalisation stands out as the most critical factor among the essential components for acquiring features from 1024×1024 images generated by TEM, with 95% of accuracy.Additionally, feature selection was employed by Dahy et al [119] to process the images generated by SEM, and their work achieves 97% of overall accuracy.

Workflow in electron microscopy analysis
Electron microscopy analysis typically has a structured workflow, dominated by image acquisition and data analysis.Image acquisition involves ensuring comprehensive coverage and high-quality images with various magnifications [120], quality parameters [1], beam settings [121] and image restoration [122].Acquisition is a time-intensive task.For example, LefmaN et al point out that it takes about 13 h to collect images from 95 specimens by TEM [123].Data analysis is focused on answering research questions by identifying and measuring images [124], but traditional electron microscopy analysis involves manual intervention [125], so it requires professional skills, experience, and time [17].
In order to reduce manual electron microscopy analysis, researchers focus on the automation of image acquisition.For example, Xu et al present an automation software based on DL to automate electron microscopy [126].Similarly, Roels et al [127] have provided a plugin based on DL software, referred to as ImageJ [128], to restore images generated by TEM.Their method can accelerate data acquisition by a factor of 4 without significantly damaging data quality.Roccapriore et al present a method based on deep CNNs to automate a STEM including beam control [129].According to their results, their method can accurately predict the atomic lattice.
Automating the data analysis is also important.For instance, Sun et al use a DL network to segment nanoparticles, extract nanoparticle shape and perform statistical analysis on the images produced by SEM and TEM [130].Their work shows 86.2% accuracy and it can process around 11 images per second by using an embedded processor.Similarly, Shen et al present a DL based analysis system to process image analysis generated by electron microscopy [131].According to the results, their method can replace manual analysis in defect detection scenarios.Moreover, in order to solve time-consuming structural identification by humans, Madsen et al introduce a DL-based method [132].Their method, trained on simulation images, can interpret experimental images captured by High-resolution TEM and match the performance of an experienced microscopist.

CNNs development
The history of CNNs can be traced back to the 1980s when a new network named neocognitron was introduced which could distinguish stimulus patterns by the difference of their shapes [133].Although the neocognitron was a ground-breaking invention at the time, it was implemented by using a simple forward-feeding technique, so the performance was still limited.In order to overcome the disadvantages of forward-feeding networks, LeCun et al [134] introduce propagation, which can decrease the number of free parameters.According to the authors, the propagation is firstly applied to Handwritten Zip Code Recognition, and the convergence of the network was greatly improved.This effectively removed the barrier to applying networks to large tasks.
The first milestone in the development of CNNs was named LeNet-5, which was applied to document recognition [135].The network structure, as shown in figure 1, consisted of convolutional layers and pooling layers, followed by a fully connected layer for the classification task.In this landmark study, Le Cun et al proved that the performance of LeNet-5 outperforms any other network.
The next milestone was the introduction of AlexNet, which includes 650 000 neurons as well as 60 million parameters.It delivered a state-of-the-art performance on a large dataset named ImageNet ILSVRC-2012 [109], which has about 1.2 million pictures with 1000 classes.The unique part of AlexNet was that it was a deep CNN since it contained 5 convolutional layers and 3 fully connected layers.The significant contribution of this finding is that deep CNN can be trained on large datasets with efficiency.Based on AlexNet, another Very deep convolutional network was introduced in 2014.Simonyan and Zisserman [138] introduced VGG (Visual Geometry Group from Oxford)-16 and VGG-19, representing very deep CNNs with 16 layers and 19 layers respectively.The unique part of VGG Net is that it significantly pushes the depth of the network to 16 to 19 layers by using very small convolution filters (3 × 3).This meant that the VGG network could learn more from the dataset due to its depth and it achieves state-of-art performance on the ImageNet benchmark dataset.
However, deep neural networks still have their drawbacks when continuously increasing the depth of VGG Networks.For example, the training for the deep neural network is difficult to converge.In order to overcome the slow convergence He et al [139] developed the Residual Network (ResNet) shown in figure 2 which has less complexity by increasing its depth to 152 layers when compared with VGG-40.The authors showed that the Residual Network has a shortcut connection by skipping one or more layers, which can allow the network to obtain the input data, which eventually helps to solve the vanishing gradient issue.Furthermore, in practice, ResNet50 refers to a Residual Network consisting of 50 layers, while ResNet101 corresponds to a Residual Network consisting of 101 layers.
With these advances, CNNs became ideal for electron microscopy tasks.

Applications of CNNs
CNNs are often used to classify nanoparticles from the micrographs generated from TEM.For example, Koyama et al [140] integrate VGG-16 with transfer learning (see section 9) in their experiments to classify omitted).In summary, CNN has been proven to be an efficient solution for classifying nanoparticles in TEM images with high accuracy, which can significantly accelerate the analysis process by reducing the need for human intervention.CNNs are also very popular for image segmentation.For instance, Zaimi et al [146] reported that AxoneDeepSeg, which is an open-source software based on CNN and is also implemented by them [146], can achieve 85% accuracy on SEM images on rat models, 81% accuracy on human cell SEM images, as well as 95% of accuracy on mouse models using TEM image and 84% on macaque TEM images.In this case, their test dataset contained microscopy images from both SEM and TEM in a type of data fusion, as shown in figure 3 Their training dataset contains 383 annotations, and the test set has 712 annotations, but all the annotations are from the same tomogram, manually annotated.Their work is validated on TOMO110, generated by their 300 kV Thermo Fisher cryo-TEM Titan Halo with a Gatan K2 direct electron detector, and EMPIAR- 10 110 [149].As a result, precision and recall increase by over 10% when compared with the result obtained from segmenting the unrestored cryo-tomographic images.Similarly, Zhou et al present a machine learning method based on a U-Net and RL pipeline to segment cryo-electron tomograms that have low SNR [148].In their work, U-Net is used to process the initial segmentation, and the segmentation's 2D refinement is done by RL.Their training dataset consists 2000 simulated images and test datasets are EMDataResource [150] and reconstructing the tomography of lipid vesicles containing monomeric mitochondrial F-type ATP synthase [151], and both are real cryo-tomographic images.Their model performance surpasses that of a pure U-Net in terms of Dice-Sorensen coefficients [152].Overall CNN is capable of segmenting nanoparticles from TEM images, even in low SNR images, which can replace manual analysis in such scenarios.
Object tracking is another application of CNNs.A U-Net Model (CNN based) was applied to the work of Faraz et al [66] using simulated micrographs as the test data, and graphs that are being generated by combining the simulated nanoparticles with the real micrographs.As a result, their method performs well on PdO nanoparticles, but they suspect that the smallest nanoparticles may be missed because of their low SNR.CNN can still be employed to replace humans in tracking nanoparticles, which can save significant amount of manual effort.
Finally, the CNNs have been used to determine the interface between materials.Yoshioka and Honda [153] reported that a CNN can correctly recognize the interface between a crystalline 4H-SiC and amorphous insulator in a cross-sectional TEM image.Their study used 2000 images for training and validation, and 2000 images for testing, which is a generous test set.However, the validity of the peak at 11 nm −1 obtained by the CNN model remains undetermined, as it is challenging for a human observer to accurately classify these ambiguous regions due to the overlap between regions caused by their vertical alignment.CNNs may be superior to human researchers for some tasks, but ultimately researcher expertise is still necessary for verification (SNR not reported).
A comparison between different types of DL algorithms is introduced in the section 11. .The major improvement of this type of network is that it processes sequences of data using both the current input and the data from previous time steps.TDNNs became a popular network for sequential data since they have the ability to model temporal dependencies.Although TDNNs work well in many applications, it was, however, difficult to learn a long-term dependency due to the fixed window size, so the amount of parameters significantly increases when TDNNs are tasked with long-term dependencies [159].

Recurrent neural networks (RNNs)
At the beginning of the 1990s, recurrent architectures appeared.For instance, Elman [160] presented the simple recurrent network (SRN) shown in figure 4. According to the authors, the hidden layer has recurrent connections, so the state of a hidden layer at a given time has a conditional dependency with its previous state [159].Although SRN is a powerful network in theory, it is hard to train due to the vanishing gradient problem [161].
In order to overcome this issue many techniques have been proposed.For example, Squartini et al [162] proposed Recurrent Multiscale Network and Nair and Hinton [163] proposed the rectified linear units (ReLu), which is an activation function, but they can only reduce the vanishing gradient effect.Meanwhile, other researchers focus on the change of RNN's architecture to address this issue.For instance, Hochreiter and Schmidhuber [164] invented the Long Short Term Memory (LSTM) which is an RNN-based network, and the 'vanilla' version is now among the most popular architecture of LSTM [165].As shown in figure 5, it has a complex structure such that input gates, forget gates and output gates reside on its hidden neurons [166].The network is able to control information's flow to hidden neurons using these gates, and in this way, the network is able to remember information from longer periods when compared with other RNNs.Based on LSTM, a Gate Recurrent Unit (GRU) was proposed by Cho et al [167] which is faster and less memory intensive than LSTM.Although RNNs are commonly used to process time series data, they still can be applied to image processing tasks.For example, Vinyals et al [168] showed that LSTM can be used to produce a text-based description of the image by receiving the features extracted by a CNN.Another example is that Mou et al [169] apply an RNN model to classify images, and this capability is likely to be highly applicable to electron microscopy in the coming years.

Applications of RNNs
Applications of RNN on micrographs have already begun to be reported.The typical application of RNN to micrographs is for segmentation tasks.For example, Chen et al [170] use FCN and RNNs to segment 3D biomedical images from 3D electron microscopy.They use two types of test data in their work, including the 3D neuron dataset from 3D electron micrographs, and the 3D fungus dataset from serial block-face SEM, respectively.As a result, the Vrand (foreground-restricted rand score) and the Vinfo (information theoretic score) are 0.9753 and 0.9870, respectively, and the pixel error is 0.0215 on the fungus dataset.A significant contribution of their work is that it provides a new way to migrate the excellent performance of 2D architectures to 3D contexts (SNR not reported).Similarly, Liu et al [171] use an RNN to perform identification and segmentation of mitochondria and a fully residual CNN to generate endoplasmic reticulum (ER).Their training and test data are from the ATUM-SEM dataset in which all the data is produced by SEM, as shown in figure 6.For ER segmentation, 60 serial sections are manually labeled and 49 of them are training data and the rest are testing data, and for mitochondria segmentation, 15 slices are training data and 5 slices are testing data.As a result, their work outperforms U-Net, FFN-2D and Mask R-CNN since it reaches a Jaccard index [172, 173] of 0.8021, a recall of 0.8442, and a dice coefficient of 0.8891 (SNR not reported).Moreover, Linsley et al [174] built a GRU Network to segment neurons from the images generated by a high-resolution serial electron microscope.Their test dataset is referred to as STAR and is publicly available [175].Compared to the other two CNN-based networks, their results surpass them, with precision and recall values of 0.994 and 0.853, respectively.These values are the best among the three networks that have the lowest number of parameters (126k), but the SNR value was not mentioned.
Beyond segmentation, the RNN has been used for prediction as well as object tracking.For example, Taha et al [176] trained an RNN to detect the real virus levels from the images generated by TEM.Their test data is a SARS-CoV-2 dataset which has 519 images from patients with COVID-19 in Italy.Their work outperforms General Regression Neural Networks (GRNN) and they also prove that their network can predict virus levels with high accuracy (SNR not reported).Additionally, Liu et al [177] present a Neuron Tracing Network, which is based on RNN and CNN.The CREMI dataset was used as a test dataset along with another test dataset generated using SEM and the link of dataset can be found from [178].As the result, their method achieves a Variation of information (VI) of 1.2491 and an adapted rand error of 0.2534, which outperforms 2D U-Net and flood-filling networks.Similarly, it obtains a VI of 3.3789 and an adapter rand error of 0.8954, which also outperforms the other two networks (SNR not reported).Overall, RNN can still be capable of predicting virus levels as well as tracking nanoparticles during experiments.
Overall, RNNs have already demonstrated great potential for automating tasks in the application of electron microscopy and they can play a crucial role as video technology is used more frequently in electron microscopy.
A comparison between different types of DL algorithms is introduced in the section 11.

Transformers in microscopy 7.1. Transformer development
A Transformer is a DL algorithm that was originally designed for natural language processing tasks.The RNN Encoder-Decoder architecture is the foundation of the transformer as introduced by Cho et al in 2014 [179].In this algorithm, the encoder is an RNN that can process the input data and produce a fixed-length representation accordingly, while the decoder is an RNN that uses the output from the encoder to generate the output.Although this architecture has some advantages, it still struggles with long sequence sentences.In order to address this problem the attention mechanism was introduced.Bahdanau et al [180] integrated attention with the encoder in their RNN Encoder-Decoder and the result was shown to perform very well on long sentences.As a result, Luong et al [181] highlighted how the combination of attention and the LSTM Encoder-Decoder can achieve the top performance on both WMT 2014 [182] and WMT 2015 [183] challenges.
Although RNN Encoder-Decoder with an attention mechanism performs well, the computation cost of the recurrent network is still prohibitive.In order to address this, the Transformer network shown in figure 7 was presented by Vaswani et al [184].According to the authors, the Transformer is still based on the architecture of RNN Encoder-Decoder with attention, but it does not use any recurrent or convolutional mechanism since all the recurrent layers are replaced by multi-headed self-attention that can be trained and executed in parallel on multiple GPUs.As the result, the training of the Transformer is much faster than any recurrent and convolutional network, and the Transformer jumped to the top of the leaderboard on both the English-to-French of WMT 2014 [182] and English-to-German of WMT 2014 [183].
Is the Transformer suitable for image recognition or image processing tasks?The answer can be found in Vision Transformer presented by Dosovitskiy, et al in 2020 [185].According to the authors, the Vision Transformer achieves or surpasses all the top convolutional networks on many image datasets of classification, but the training cost of Vision Transformer is much cheaper.This approach has the potential to become the dominant technology in a variety of electron microscopy.

Applications of transformers
Transformers have already been used in electron microscopy for a variety of applications, including segmentation tasks.For example, Wang et al [186] used Transformers Enhanced Segmentation Network (TESN) to segment nanoparticles from images generated by TEM.According to the authors, they have 104 TEM images as the dataset and there are 2169 nanoparticles in those images; examples are shown in figure 8.The dataset was expanded to 416 images by rotating the images to 270 • , 180 • and 90 • , which is a standard data augmentation technique.A total of 344 TEM images and 72 TEM images were selected as the training and test dataset, respectively.The TESN demonstrates good performance of segmentation even when the edge of a nanoparticle is fuzzy and several nanoparticles are overlapped.Additionally, the authors evaluate that the performance of TESN is nearly close to the annotation by a human, which is still the preferred method of establishing the ground truth.In the future, it will be interesting to see how this approach performs with more complex nanoparticle shapes and wider distributions of sizes (SNR not reported).
Li et al [187] also found that the Dense Transformer Networks (DTN) can perform segmentation tasks from Brain's images produced by electron microscopy.In their dataset [188], all the images are 224 × 224 patches randomly selected from 100 of 1024 × 1024 image slices as a training dataset.The authors evaluate the performance of their work by comparing its AUC [189] with the U-Net model (CNN Based) and the result shows that the AUC of DTN is improved from 0.8676 generated by U-Net to 0.8953, and the higher AUC represents a higher True Positive Rate and lower False Positive Rate.However, their current work only implements one layer of the encoder with that of the decoder, and the performance of multiple layers will need to be explored in future (SNR not reeported).
Transformers can be used to perform classification tasks as well.For instance, Duan et al [190] found that the classification of pollen types can be executed by Vision Transformer, on s scale very different to the nanoparticles previously reported.The dataset of their work consists of 42 pollen classes produced by SU-8220 field-emission-SEM, so 25 369 images are used for training and validation, and 805 images, which have 23 classes of pollen, are selected as the test dataset.This was a much more challenging multi-objective classification task than has previously been reported in the DL-enabled microscopy domain.According to the authors, the result shows the Vision Transformer can reach the same performance as CNN based network under the same conditions, but the required model parameters and training time are much less than the CNN-based networks, making it more efficient and computationally superio (SNR not reported).Similarly, Zhang et al [191]    fewer model parameters and less computation cost than CNN-based networks and RNN-based networks, so an increase in the applications of Transformers in this field can still be expected.Transformers are also a very active area of development in computer science, suggesting future improvements that could significantly benefit microscopy.
A comparison between different types of DL algorithms is introduced in the section 11.

Genetic algorithms (GA) in microscopy
8.1.GA development GA is an unsupervised learning technique motivated by natural selection and evolution in the real world.Strictly speaking, GAs is a metaheuristic used for all kinds of optimization problems, not machine learning, but they are useful in many machine learning problems.A GA is a process for optimizing a problem by evolving a population of candidate solutions, often referred to as individuals, animals, or genotypes.The target of the GA is to continuously improve the quality of the solutions through the mechanisms of selection and variation, resulting in better-optimized outcomes.GA was first introduced by Sampson [192] in 1967, where a series of bit strings are used to represent the problem controlled by the algorithm.The author describes the concept and presents the details of how natural selection and genetics are applied to computer programs to solve problems.The selection, mutation and crossover applied to computer programs were demonstrated by solving sophisticated problems.Goldberg and Holland [193] further developed GA according to the evolutionary theory of Darwin in 1988, and Mitchell et al [194] introduced the concept of fitness landscapes in 1991.The main contribution of their work is that they invent a method to clearly measure the relationship between the crossover and the dependency between landscapes and the performance of a GA, respectively.
A milestone in the history of GAs was the seminal paper published by Holland [195].Here the author describes how to apply crossover, mutation and selection in practice.The algorithm prefers the best-fit string as parents since their offspring will have a higher likelihood of survival and superior traits compared to their parents.At this point, GAs became a part of the evolutionary algorithm ecosystem.In more recent years GAs have been used for a variety of applications outside of life sciences, such as scheduling and finding the shortest path, as well as in modeling and simulation where the use of random functions is required [196].In comparison, evolutionary algorithms are utilized to tackle problems for which there is no well-established, efficient solution [197].The overview of GA is shown in figure 9.

Applications of genetic algorithms
Because of the concept of GA, it can be considered as a more efficient "tweaking" of settings to find an optimal configuration, ideal for optimization of the settings of electron microscopy.For example, Kim et al [198] suggested that the combination of a generalized regression neural network (GRNN) and GA can be used to search for the best resolution of SEM.A prediction model of the SEM's resolution was built by using a GRNN and the GA was employed to enhance the prediction model.The authors use 3D plots to evaluate the impact of equipment settings on the resolution.As the result, they successfully optimized the SEM resolution, and the same approach can be used to characterize other components of an SEM (SNR not reported).GAs can replace humans in tuning electron microscopy configurations, which can significantly save time during image preparation.The overview of Genetic Algorithm (GA), revised from [58].In GA, candidate solutions for the next generation are chosen in the reproduction process.GA then applies random changes to these candidates in the mutation process, allowing for the exploration of new regions of the search space.Additionally, GA creates new candidate solutions by exchanging genetic material between at least two candidates via crossover.
Beyond the instrumentation, GAs can be used to reconstruct 3D atomistic structures.For instance, Backer et al [199] used a Bayesian GA to reconstruct nanoparticle images generated by annular dark-field STEM (ADF STEM).In their experiment, they used a simulation dataset and compared the reconstructions produced by GA with the ground truth model in order to evaluate the result.They also apply their method to observed experimental data, which contains a Pt nanoparticle catalyst recorded by a time series of 25 frames.The findings reveal the potential for obtaining trustworthy representations of beam-sensitive nanoparticles while undergoing dynamic processes, based on images captured using incident electron doses that are sufficiently low.However, their method is only validated on Pt nanoparticles and SNR was not mentioned in their work.
Other analyses of the morphological properties of nanoparticles using GA have also been reported.Lee et al [200] showed that the morphological properties of nanoparticles can be screened by GA with 99.75 accuracies on 155 456 nanoparticles from 814 TEM images, and the false discovery rate is only 0.25%.However, this study only included gold nanoparticles and had the disadvantage that the result of the training stage eventually had to be checked by the naked eye, meaning that it will not scale to large datasets (SNR not reported).This limitation may be potentially overcome by levering another advantage of GA: that it can be integrated with other techniques to execute annotation.For instance, Rusu and Wriggers [201] claim that VolTrac (GA based) can trace the alpha helices from the images produced by cryo-EM using a tabu search and a bidirectional expansion.Examples are shown in figure 10.According to the authors, the GA focuses on a global search to detect helical regions' fragments.Their results are validated on both experimental and simulated data, eventually predicting helices with more than six residues in simulated and experimental maps at 4 Å to 10 Å resolution with an accuracy between 70.6% and 100%.The main limitation of their work is that the performance is generally insufficient in low-resolution cases (SNR not reported).In general, we can conclude that GAs can replace human researchers in analysing the morphological properties of nanoparticles in TEM images, depending on the resolution.
GAs are still an emerging area in microscopy, but have particular applications in automating manual tasks that are either repetitious, complex or labor-sensitive, so their main impact will likely be to accelerate scientific progress.
A comparison between different types of DL algorithms is introduced in the section 11.

Transfer learning (TL)in microscopy
9.1.TL development TL is a part of machine learning that concentrates on utilizing previously trained models as a foundation to tackle new and similar cases.The first paper of TL can be traced back to Bozinovski and Fulgosi in 1976 [202].Their work provides a mathematical model and geometric understanding of TL and a method to evaluate results as positive, negative, or neutral.TL is intrinsically interesting to science, where patterns and behavior repeat, and particularly applicable to microscopy where leveraging knowledge from past samples can have a large impact on both productivity and discovery.
Although the concept of TL is not new, the combination of TL and machine learning emerged in the 1990s.Pratt et al [203] introduced the combination of TL with neural networks, which was shown to accelerate the learning process on a target problem by reusing the weights obtained from a network that was previously trained on a related source task.The general framework of this application of TL is shown in figure 11.Pratt later expanded this work [204] to include Discriminability-Based Transfer (DBT).This approach estimates the usefulness of hyperplanes established by the source weights in the target network by   [203,205,206].This figure illustrates three different approaches to TL.The top approach shows the TL indirect approach, which involves extracting information in symbolic form from one neural network and inserting it into another and this approach has the slowest learning speed among the three [203].The centre approach shows the TL direct transfer of a single source to a single target network [203].The bottom approach shows the TL direct transfer of multiple source networks to different parts of the target network, which is suitable for complex tasks [203].Backpropagation is used for both regression and concept learning [203].using an information measure, and it accordingly adjusts the magnitudes of transferred weight.As the result, this method learns faster when compared to a new model trained from scratch.
Beyond efficiency gains, there are other advantages of TL.Ba and Caruana [207] report that their pre-trained shallow network has a better performance than deeper convolution models trained from scratch on the CIFAR-10 image [111] as well as TIMIT phoneme recognition tasks.TL can be used to overcome challenges associated with data scarcity, a problem common to many areas of science and technology.In DL, small datasets usually lead to overfitting, but large scientific datasets are often infeasible due to high costs, limited resource availability or overall scientific difficulty in making samples.For this reason, Barman et al [208] examined the transfer learning on ResNet50 (see section 5.1), showing that TL on a small dataset can achieve the accuracy of 95%, but the pure ResNet50 can only reach 50%.
Although the concept of TL is compelling, it has not been widely used in science as, to date, it can only be applied on image data that are very similar and generally requires large datasets.To some degree, this is less of a problem in microscopy.

Applications of transfer learning
TL has only recently entered the electron microscopy field.For example, Matson et al [209] use a pre-trained ResNet50 based on ImageNet to classify the different shapes of Carbon nanoparticles from TEM images.As the result, their method can achieve an accuracy of 85% (SNR was not reported).Moreover, Zhang et al [210] used VGG16 with TL to predict sample thickness of TEM samples from 4D data generated by STEM.That method is pre-trained by the ImageNet dataset and is retrained on a relatively small dataset, and it is validated on 4D STEM data.The result shows that their regression method reaches 70% of root mean square error on the simple which has less than 35 nm thickness, and their classification model can reach 66% accuracy in producing results that were within 2 nm increments of the high-angle annular dark-field STEM thickness.While the performance is still low, the task is very challenging (SNR not reported).Similarly, Luo et al [211] used VGG16 pre-trained by ImageNet to identify nanoparticles of carbon nanofibers/nanotubes. Their pre-trained VGG16 is re-trained on a small dataset with 5323 TEM images.Eventually, their model achieved 90.9% accuracy on the new source dataset that had 4 classes divided into 4128 TEM images, and 84.5% accuracy on an 8-class dataset with more than 4900 TEM images (SNR not reported).Another example is that a CNN model pre-trained on ImageNet is applied to a classification task of Pt Nanoparticles by Koyama et al [140].Their training dataset contained 1050 TEM images with 3 classes and the test dataset contains 300 TEM images.Their work can achieve 94% accuracy and an example of the visual representation of the image classification associated with TL is shown in figure 12 (once again, the SNR was not reported).
Excluding VGG16, other CNN models can still leverage the advantages of TL.One example includes deep CNNs, such as the InceptionV3 [76] and ResNet50 pre-trained on the ImageNet database by et al [212] However, the authors remove the first and last fully connected layer of the pre-trained models since the similarity of the TEM dataset and the ImageNet is low, and the networks are re-trained on a very small TEM dataset provided by the authors, which only has 190 TEM images and the average peak-SNR(PSNR) [213] is approximately 22.25.Empirical interventions like this are not desirable, but the networks are eventually validated on a small test dataset with 21 TEM images and were faster than training a new model from scratch.Similarly, Dabiri and Kassab [214] validate the performance of TL on AlexNet, VGG16 and SquezzeNet on a classification of the SARS-CoV-2 virus.Their models are also pre-trained on the ImageNet database and re-trained on a small training dataset with around 1280 TEM images.After re-training, these networks are validated on a test dataset with 320 TEM images.As the result, VGG16, AlexNet and SqueezeNet can reach 75.3±4.7%,77.8±4.5% and 77.8±4.5% respectively.However, SNR value was not mentioned in their work.
Overall, the application of transfer learning in electronic microscopy is on the rise, but currently largely restricted to ImageNet to pre-train a CNN model, even though the similarity between the images contained by ImageNet and actual targets is quite low.Although the pre-trained CNN model can still achieve good performance even if it is re-trained by a really small dataset, there is still a requirement for a lot more micrographs to be generated if performance, stability and are to improve.Pre-training on a large scientific, or ideally electron microscopy, the database would alleviate these issues if the community could come together to contribute millions of images.
A comparison between different types of DL algorithms is introduced in the section 11.

RL in microscopy
10.1.RL development RL is a subfield of both machine learning and artificial intelligence that deals with the challenge faced by an agent learning behavior through a process of trial-and-error within a constantly changing environment [215].RL is neither supervised learning nor unsupervised learning since it does not require predefined training data, but adapts and learns via a defined reward (or penalty) system.The architecture of RL is shown in figure 13.RL was first used for optimal control.According to Bellman and Kalaba [217], dynamic programming can determine the usefulness of a communication channel for conveying information in the statistical communication theory, and for this, a new type of computing was required.Dynamic programming is a technique for addressing complex issues by dividing them into smaller, more manageable sub-problems.Each sub-problem is then solved only once, and its solution is saved for future use.In the same year, the Markovian decision process (MDP) is proposed based on dynamic programming [218], which is a milestone in the history of RL and is shown in figure 14.MDPs belong to a category of decision processes that follow a random sequence, where the cost and change in the state depend only on the current state of the system and the action taken in that state [219].Howard [220] originated a technique for finding optimal policies in MDPs called the policy iteration method.
A milestone in the history of RL is temporal-difference learning.Sutton [222] showed the temporal-difference method's convergence and reached an optimal solution for specific scenarios and connects it to RL.The temporal-difference learning is a technique for acquiring knowledge with postponed rewards and can be used for both prediction and control tasks in which control decisions are made by optimising the expected outcome.According to Silver et al [223], temporal-difference learning is a highly successful and widely used approach to the problem of RL since it can reach expert-level play in games such as chess, checkers and backgammon.
Another important addition to RL is Q-learning, presented by Watkins [224] The main idea of Q-learning is to learn the optimal way of acting in controlled Markovian domains [224].Based on Q-learning and TD-Gammon architecture [225], Deep Q-Network (DQN) was then developed [226].DQN is a combination of RL and a deep neural network and it has a stable stochastic gradient descent [226] achieving a state-of-the-art result in six out of seven Atari 2600 computer games.Based on DQN, Mnih et al [227] present an implementation referred to as the Asynchronous Advantage Actor Critic (A3C) algorithm, which employs multiple agents that can contribute to the total knowledge of the whole network by their individual learning, and they update their optimal policy function based on the action value obtained from the action value function.The action value function is able to help the agent to choose actions that will result in the highest possible expected reward over a period of time.As a result, their work can be trained much faster than conventional DQN and can also achieve better results than conventional DQN.

Application of RL
RL is applicable to optimization in the field of electron microscopy, such as adaptive scanning methods.For example, Ede [228] integrated an RNN with RL to optimize the scanning problem encountered when using STEM.In this case, RL based on an MDP is used to train the RNN to work together with a feed-forward CNN that accomplishes sparse scans.The training and testing datasets contain 15 815 and 3954 STEM images, respectively.This method was shown to always match or outperform the static scans' performance and thereby damage caused by electron beams and scanning time while retaining as much information as possible.However, a major disadvantage of their work is that the error caused by the generator increases significantly when it is trained to handle multiple scan paths, compared to when it is only trained for a single scan path (SNR not reported).Schloz et al [229]  RL can optimize electron microscopy parameters, helping researchers save time in determining optimal configurations for experiments in their research, but care should be taken to avoid biases that can be amplified by RL [230].
RL can be used to improve the image quality of micrographs.For instance, Fan et al [231] employs DQN to extract the max number of images generated by single-particle cryo-electron microscopy with good quality in a fixed time period.Their training dataset has 4017 micrographs and 3969 images are constituted of the test dataset.Their results outperformed simulated annealing and GA, but future work will be required to take into account a greater portion of the hierarchical process involved in cryo-EM data collection (SNR not reported).Another example is the work of Jang et al [232] who used Deep Neural Network (DNN) and RL to obtain high-quality micrographs generated by SEM by automatically adjusting the control parameters.In their work, SEM parameters are controlled by RL such as WD, BRT, CNT, and the DNN evaluated the image quality.Their training dataset and test dataset contain 27 235 images and 3037 images, respectively.As a result, their method can generate images with high quality and 90% accuracy as assessed by human researchers (SNR not reported).This shows that RL is also a viable option for enhancing the quality of TEM images by determining optimal micrograph settings.
A comparison between different types of DL algorithms is introduced in next section 11.

Selecting the right approach
CNNs are suitable for image-related tasks, and can achieve state-of-art performance on the ImageNet LSVRC-2010 dataset with more than 1.2 million images and about 1000 different classes [109].CNNs can be parallelized by using multiple GPUs on X86 PC and ASIC as well as FPGA-Based designs [233], and can be integrated with TL to achieve good performance when the pre-trained CNNs are re-trained with a small dataset [234].However, CNNs suffer from a vanishing gradient problem, that needs to be prevented by some techniques such as skip connections [235], and are not suitable for time series tasks because of their architecture, although this is an area of active research [236,237].The main disadvantage is that the training time of CNNs is slow due to its many convolutional layers [184].
RNNs are suitable to process sequential data [238,239].However, RNN has a limited memory when capturing long-term dependencies, so it can also encounter the vanishing gradient problem [240].This can be improved to some extent using LSTM [164], but then researchers need to avoid the exploding gradient problem caused by their architectures [241].This is also an area of active research [242,243], but the main drawback is that the training time of RNNs is long since the RNNs are difficult to be trained in parallel [244].Some researchers have presented solutions to this problem [245,246], but those techniques may not be able to apply to all types of RNNs.
The greatest advantage of a Transformer is that it can be trained faster than RNN and CNN as it can be easily parallelized [184].The self-attention used by Transformers can provide better performance on long-range dependencies tasks than RNNs [184].However, the high performance of Transformers relies on data volume [247], and while this can be overcome to some extent by several techniques [248], they still need to be examined if applying them to all types of Transformer.Another disadvantage of Transformers is that it requires a large size of memory to be executed [249], a topic that is under investigation in computer science [250,251] but is currently only validated on specific Transformers under specific circumstances.
GAs can seek for global optima [252], but are not guaranteed to find it.GA can be used on complex and non-linear problems [253] and is a highly parallelisable [254].Additionally, GA is able to generate a good solution even from incomplete data [255][256][257] and noisy data [257][258][259].Nevertheless, it can be a challenging and time-consuming process to determine the appropriate parameters for GAs [260] which have their own hyper-parameters in addition to the hyper-parameters that must be tuned for each type of machine learning model.The computational cost of GAs can be expensive when it is applied to a problem with a large search space [261].The lack of transparency could be another drawback of GAs under some circumstances [262].
TL can reduce training time and training data [263] and improve the accuracy of prediction [264], by levering pre-trained models and re-trained on the smaller datasets of images from virtually any domains.However, while pre-trained models can be provided by using a specific dataset, such as the models provided in Tensorflow [265], the new model's flexibility may be constrained, as the pre-trained models may be unable to adjust to new data or tasks.TL can also be computationally expensive if it needs to be both pre-trained and fine-tuned on a new task, and irrelevant features learned by the pre-trained model can negatively impact the new tasks [266] without this being apparent to the user.
RL can adapt to changing environments [267] and requires a small amount of training data [268] and the learning rates of RL are well known [269].However, the RL can easily be trapped in the local minima [270], and the process of creating several policies requires a significant amount of computational resources [271].
So how do microscopists choose the right tool for the job?RNN, CNN and Transformers can all be used to process an image such as nanoparticles classification, nanoparticles segmentation as well as nanoparticles tracking, but the combination of CNNs and TL is a good choice when working on small datasets (which are virtually all of them microscopy, compared with the millions of images these algorithms were originally developed with) since CNN has a good performance on almost all the image-related tasks.RNNs are capable of processing time series data, so they can be used to track objects in real time and work with videos.If large datasets are available, replacing CNNs and RNNs with Transformers is a good strategy.Transformers can still be integrated with Transfer Learning, this technique is still needed to be examined in the electron microscopy field.Usually, CNNs can be a good first choice or a base model when evaluating an image-related task because they are relatively simple to set up.Based on the performance of CNNs, one can also consider using RNNs and Transformers, as they tend to have better performance, provided that their disadvantages are not introduced.
GA does not rely on training data since it is unsupervised learning and it is capable of any optimization problem since it can be used to search global minima, so it can work in combination with RNN or CNN or Transformers to search the optimal configuration of electronic microscopy under a fixed circumstance.This could significantly save time and resources and it can also work with RL in order to optimize the policy of an agent.RL also does not rely too much on training data as it can interact with the environment directly, but it could be trapped in local optima so the combination with GA is also a good strategy.Both are capable of searching for suitable configurations of electronic microscopy in a changing environment, which includes the control parameters.When an optimal solution is needed, GAs and RL can be used to relieve researchers from complex and repetitive tasks, such as determining the optimal parameters for electron microscopy to obtain high-quality images.If the local optimum [272] can be accepted in the experiments, RL can be a good choice, as it will be faster than GAs in these scenarios.Otherwise, GAs are a better choice there the global optimum is essential.

Human-assisted workflow
The workflow of DL comprises several steps, as illustrated in figure 16.First and foremost is data acquisition, which contains data collection and data labeling.Data can be collected either by employing a public dataset or by manual collecting, with the latter requiring manual labeling.Secondly, the dataset is required to be cleaned and normalized or standardized based on its features.Thirdly, the right model should be decided manually according to the problem to be solved, with reference to section 11.Fourthly, the model needs to be built and trained accordingly.The result should then be inspected and evaluated manually, considering metrics such as accuracy for classification tasks and mean square errors for regression tasks, and if possible, validation by a human researcher on unseen images.Regardless of the model, hyperparameters must be tuned, and the model fails to converge, an alternative model may be trained with a different architecture.The workflow is considered complete when the performance of the model matches expectations, but since there are numerous decisions to be made by human researchers, this could be an iterative process and the final workflow is likely to be problem specific.

Future opportunities
In this review, we briefly examine electron microscopy and its potential for improvement by combining it with DL.We examine the CNNs, RNNs, Transformers, Genetic Algorithms, RL and Transfer Learning and how they are applied in the TEM field in practice, and their future potential.Additionally, we point out the advantages and drawbacks of each of them and how to choose them to fit different problems.CNNs, RNNs and Transformers are capable of image processing and can be improved by transformers.Transfer Learning has a good performance when it is integrated with CNNs.GAs are capable of the optimization problem and RL can solve the optimization problem in a changing environment.
There are two points that still need to be addressed in electron microscopy.First of all, a standard benchmark TEM image dataset is required in this field.This will benefit the entire community.Reasonably large datasets are not usual in this field, so researchers can implement Transfer Learning to overcome the overfitting problems (which are common in the field) on the dataset they have, but the use of a pre-trained model from natural images such as ImageNet is not ideal.Secondly, the application of Transformers still needs to be evaluated.Transformers demonstrate their powers in the computer science field to replace RNNs and CNNs under any circumstances and could have wide applicability in microscopy but usage is still low.
In the DL field, researchers are focusing on the improvement of current DL algorithms by introducing new techniques.This includes communities working on expanding the depth of DL algorithms in order to enhance the learning abilities of DL algorithms by overcoming hard-to-train problems [273].Comparatively less effort is being directed to dealing with small data sets or such as those in the electron microscopy field that requires lots of effort to develop.Solutions to these issues may need to come from the computational science community, not from DL researchers.Using simulated TEM images could be one approach to overcome this problem.Although they are used not frequently in this field, the application of simulated TEM images is worthy of more attention.
Finally, one topic that has received less attention is to use of an overlooked but valuable asset of the electron microscopy community: reciprocal space.Electron diffraction patterns, which are generated by TEM when the scatter occurs due to an electron beam blocked by an object [274], could be another important feature that is suitable for the classification of material and nanoparticles since it provides each structure with a unique fingerprint [275].It has been proven that a diffraction pattern can be recognized by a computer program [276], but this is virtually an untapped resource ripe with opportunity.

Figure 1 .
Figure 1.The architecture of LeNet-5, revised from [135].C1, C3 and C5 are convolution layers, which can extract features from a given image [136].S2 and S4 are pooling layers, which reduce the size of feature maps through downsampling [137].
. The fusion was imbalanced, with nine of the images generated by SEM used for training and validation, three of the SEM images used for testing, 136 TEM images are used for training and validation, and the remaining 25 TEM images reserved for testing.This led to several limitations.Firstly, the authors report that large resampling factors are required in order to find a common resolution space since the resolution ranges of images produced by TEM and the resolution ranges of images generated by SEM are very different.Secondly, different textures and microstructures of the tissue are captured by TEM and SEM separately, making the data fusion incomplete; and finally that the results are not ideal when continuously training the model on SEM and TEM.Combining electron microscopy at different scales shows promise, but more work needs to be done and SNR is not mentioned in their work.Segmenting tomographs from cryo-TEM is a challenging task due to the very low SNR caused by the limited electron dose [147, 148].Buchholz et al present Cryo-CARE, based on a U-Net Model (CNN-based), to restore cryo-TEM images by increasing contrast and SNR, and segment tomographs from them [147].

Figure 6 .
Figure 6.Examples of the mouse cortex neural tissue acquired by ATUM-SEM.(A) An example of an aligned image stack that covers approximately 20 × 20 × 10 µm through the ATUM-SEM method.(B) and (C) Examples of mitochondria and other ultrastructures.The green arrows indicate mitochondria; the red arrows indicate vesicles; the yellow arrow indicates the Golgi body, and the purple arrow indicates the endoplasmic reticulum.(D)-(F) Examples of mitochondria segmentation in our training dataset.Reproduced from [171].CC BY 4.0.

Figure 7 .
Figure 7.The architecture of Transformer revised from[184].The multi-head attention mechanism enables the networks to selectively concentrate on various segments of the input sequence with different weights as well as to acquire several representations of the input data concurrently.

Figure 8 .
Figure 8. Examples of nanoparticle TEM image dataset used with Transformers Enhanced Segmentation Network (TESN).(a) TEM image of the nanoparticle.(b) schematic diagram of the JSON annotation file.Reprinted from [186], Copyright (2022), with permission from Elsevier.

Figure 9 .
Figure 9.The overview of Genetic Algorithm (GA), revised from[58].In GA, candidate solutions for the next generation are chosen in the reproduction process.GA then applies random changes to these candidates in the mutation process, allowing for the exploration of new regions of the search space.Additionally, GA creates new candidate solutions by exchanging genetic material between at least two candidates via crossover.

Figure 10 .
Figure 10.(A) A simulated map obtained by low-pass filtering a GroEL monomer to 8 Å resolution is presented along with the helices predicted by VolTrac (represented as blue tubes).(B) Side and (C) bottom views of VolTrac results (blue cylinders) overlapping the target crystal structure (a-helices represented as yellow ribbons).Reprinted from [201], Copyright (2012), with permission from Elsevier.

Figure 11 .
Figure 11.The framework of Transfer Learning (TL) revised from[203,205,206].This figure illustrates three different approaches to TL.The top approach shows the TL indirect approach, which involves extracting information in symbolic form from one neural network and inserting it into another and this approach has the slowest learning speed among the three[203].The centre approach shows the TL direct transfer of a single source to a single target network[203].The bottom approach shows the TL direct transfer of multiple source networks to different parts of the target network, which is suitable for complex tasks[203].Backpropagation is used for both regression and concept learning[203].

Figure 12 .
Figure 12.Examples of image classification with the CNN model in association with transfer learning (TL).(a)-(c) A TEM image in Class 0 and results obtained for the model without and with fine tuning (FT), (d)-(f) a TEM image in Class 1 and results obtained for the model without and with FT, and (g)-(i) a TEM image in Class 2 and results obtained for the model without and with FT.Reprinted from [140], Copyright (2021), with permission from Elsevier.
also used RNN and RL based on the MDP to optimize the scanning method used by STEM when the ptychographic reconstruction is required.Their training dataset has 175 images of a MoS 2 monolayer specimen and each of them contains about 10 000 diffraction patterns.Their work has the potential to improve the average reconstruction quality by as much as 27.75% and increase the resolution by 31.59% according to a low-dose experiment.Examples of fine-tuning of a policy of RL with and without initialization of the RNN on a MoS 2 monolayer are shown in figure 15.However, SNR value is not mentioned in their work.

Figure 15 .
Figure 15.Examples of fine tuning of a policy with RL that (a) has not been initialized and (b) has been initialized via supervised learning.Positions A indicate the scan positions of the first subsequence that is provided to the RNN as part of the initial input.Positions B and C are the scan positions of all predicted sub-sequences at iteration 0 and 10 000, respectively.The trajectories they form during the optimization process are indicated by a dashed blue line.Reproduced from [229].CC BY 4.0.