Leaf disease detection using deep Convolutional Neural Networks

The automatic recognition of plant diseases is of crucial importance for the current development of agriculture. Fast and efficient identification can greatly reduce the natural, economic, and human resource loss caused to agricultural practitioners. Deep neural networks allow computers to learn plant disease detection in an end-to-end manner, thereby obtaining better results and higher efficiency. While Convolutional Neural Network (CNN) models have become a well-established tool for detecting plant diseases, the lack of robustness of the models due to environmental variations remains to be a critical concern. Recent research into overcoming this challenge includes domain adaptation (DA) algorithms like classic Domain-Adversarial Neural Network (DANN) or the innovative Multi-Representation Subdomain Adaptation Network with Uncertainty Regularization for Cross-Species Plant Disease Classification (MSUN). However, the topic remains under-explored as the newly developed methods were not tested on many crop species and diseases. This research focuses on four deep CNN models (MobileNet, VGG, GoogLenet, and ResNet). The models are developed and tested using the New Plant Diseases dataset on Kaggle, which comprises 70,000+ training images (offline-augmented) and 17,000+ validation images encompassing 38 different classes of healthy and diseased plant leaves. The models would be cross-evaluated upon their accuracy and training speed, as well as their change in performance after optimization and applying DA methods. With an uppermost accuracy of 86.4% in test dataset from the wild, results show that Transfer Learning, Model Ensemble as well as Domain Adaptation works effectively to increase the robustness of models which will ultimately benefit farmers in detecting plant diseases and deciding on the best treatment in real-time.


Introduction
Food security has become a major concern in modern humanity as it affects not only the well-being of individuals but also social stability.To ensure food security, improving agricultural efficiency is crucial.
In recent decades, agriculture has faced numerous challenges, with one of the most urgent issues being the destructive impact of plant diseases on crop production.For example, leaf spot and ear rot in corn, late blight and brown rot in potato, rust in soybean, and late blight and yellow leaf curl in tomato are some of the most costly diseases worldwide [1].The plant disease problem is becoming even more of a concern with the current condition of climate change.
Climate change alters the distribution and quantity of insects, as well as increase the frequency of extreme weather events, and thus causes a great number of observed, anticipated, or possible consequences on crop health worldwide.Projected climate change levels are anticipated to lead to the emergence of plant diseases.For example, elevated water or temperature stress on plants, and wet weather promote the proliferation of fungal and bacterial pathogens [2].
As a result, to ensure sustainable agricultural practices and global food supply, it has become increasingly important to identify and manage plant diseases.The time-consuming, labor-intensive, and often error-prone traditional disease diagnosis methods that rely on human knowledge would not be enough.Therefore, a fast, efficient, and accurate identification technology is essential to maximize agricultural productivity and product quality.
At present, due to the advantages of Convolutional Neural Network (CNN) such as feature learning, which can heextract features and recognize and help classify images, it has been widely used in society, such as image classification [3], medical disease detection [4], text classification [5], autonomous driving [6], video analysis and processing [7].In addition, another advantage of CNN is Transfer learning [8].Cellular neural networks that can be pre-trained on large data sets (such as ImageNet [9]) can be used as feature extractors for other tasks.When data is limited, transfer learni allows the use of learning characteristics in one domain to improve the performance of related but different tasks.CNN has developed many robust and mature architectures, such as AlexNet [10], GoogLeNet [11], ResNet [12], VGG-16 [13], LeNet [14].

Literature Review
Works done in the field of using DCNN model to detect plant leaf diseases were summarized in the table below (Table 1) based on whether they have employed Transfer Learning (TL) and on what models they adopted.
GoogLeNet with inception was a popular model to be employed with TL techniques, and the results achieved, especially results based on the Plant Village dataset were usually decent.The research conducted by P. Dong et al. (2022) intended to tackle the insufficient data problem by applying transfer learning to Xception and InceptionV3 models, which are pre-trained by ImageNet.They found that different models would achieve their best accuracy (92.04%) at different numbers of trainable parameters, specifically 70% for InceptionV3 and 80% for Xception [15].Another Inception Model(V4) was proposed by Too et al., where truncation and definition of a new model with an Average pooling layer (8×8), dropout and softmax on the top layer were performed, resulting in an high accuracy of 98.08% [16].
VGGNet was also favoured due to its portability when it comes to applying TL.Chen et al. employed VGGNet that was pre-trained on ImageNet and was taken with two inception modules added (aliased as INC-VGGNet).Models by both teams managed to reach 90% in the prediction of maize and rice respectively, although the proposed INC-VGGNet seems confused with "Phaeosphaeria Spot" and "Maize Eyespot" diseases, which sometimes occur in the same leaf in the dataset.The authors also mentioned about the potential negative impacts on maize detection results (which was just above 80%) by the clutter field background and uneven illumination intensity [17].Too et al also tried TL on VGGNet by truncating and replacing the original softmax layer from a model pre-trained on ImageNet, achieving an accuracy of 81.83% [16].
There are also some researchers that applied TL techniques to different types of ResNet models.For example, Ahmed and Ahmed proposed the application of TL on Inception-ResNet(IncResNet), and claimed to achieve a validation accuracy of 100% at 20 epochs on palm disease detection.However, since the model was not tested on a separate set of unseen data, it would be hard to say whether the model would be performing as well in more practical situations [18].Again, other researches that employs ResNet50 [19] [16] and ResNet101 [16] mostly uses Stochastic Gradient Descent (SGD) as their optimization algorithms and confirms the feasibility of high accuracy above 90%.AlexNet was able to achieve decent accuracy when it comes to detecting plant diseases using images in lab conditions (PlantVillage).The AlexNet constructed by Ane (2023) consists of 5 convolutional layers and 3 ANN layers, with a max-pooling layer behind each convolutional layer.The model achieved an accuracy of 95.25% during training and an accuracy of 87% on the validation data.It is mentioned that algorithm was deployed so that some neurons randomly fail during the training process, effectively preventing overfitting and greatly improving the model's generalization ability [20].However, almost all of the above researches were conducted using PlantVillage dataset, which was obtained in laboratory conditions.When it comes to data obtained in field, it seems even with TL, the accuracy of the model would drop.The AlexNet proposed by Yao et al [21] has a total of 8 layers, including 5 convolutional layers and 3 fully connected layers.The Activation function uses ReLU Activation function instead of the sigmoid function, but the final accuracy of the model after TL was only 48%.In the same study, the accuracies of other models were rather higher (60% by InceptionV3 and 65% by VGG16).VGGNet16 uses a 19-layer structure, including 16 convolutional layers.It is characterized by the use of smaller convolution kernels (3×3) and pooling kernels (2×2).Inception-V3 replaces the original 7×7 convolution kernel with two one-dimensional convolution kernels, namely 1×7 and 7×1.It is showing that in the process of migration learning, the data has been overfitted.The accuracy of VGGNet16 with TL only reached 95% after the following two optimization methods: data augmentation, and learning rate decay.
It's worth noting that the vast majority of articles on leaf disease detection trained their models entirely or partially on the Plant Village dataset.This does not address the models' lack of resilience when applied to actual field conditions.

Methodology
This section provides an overview of important concepts applied in this research, including CNN models, Optimisation methods, Transfer Learning (TL) and Domain Adaptation (DA).The introduction and discussion of the proposed methodology and models follow from the setup of these basic concepts.

CNN Models
CNN is a commonly used deep learning network.Its main features include local connections and parameter sharing, which allows for excellent performance in large-scale image processing.The basic structure of CNN models generally consists of several components, including an initial layer for input, followed by convolutional and pooling layers, then fully-connected layers, and finally an output layer.The order of convolutional layers and pooling layers is not fixed.The depth of the network depends on the number of different types of layers.
CNN models usually have various architectures and variants.Common CNN models include: MobileNet, AlexNet, VGGNet, GoogleNet, ResNet and so on.Each model has its specific structure and characteristics.MobileNet, as its name suggests, is designed for efficient deployment on mobile devices.It uses depthwise separable convolutions to achieve a balance between accuracy and low computational complexity.AlexNet features multiple convolutional and pooling layers, and plays a key role in popularizing deep learning for image classification.VGGNet is known for its simplicity and uniformity, utilizing small convolutional filters and deep architecture, making it easy for both comprehension and

Table 1. (continued).
implementation.GoogleNet (Inception V1) is the first network to introduce the Inception module with parallel convolutions to capture features at multiple scales.ResNet, on the other hand, introduces residual connections, which allow gradients to flow directly through layers.This new technique alleviates the vanishing gradient problems and thus enables training of extremely deep networks.
Due to limited time and resources, pre-trained CNN models are commonly used, being applied to specific tasks through transfer learning.In the field of image classification, ImageNet is a commonly used dataset to pre-train CNN models.On large-scale datasets, general feature representations can be extracted with pre-trained models and then fine-tuned on task-to suit the requirements of specific datasets.
Proposed models & methods: For the very same reason of limitations in time and computational resources, we concentrate on the following pre-trained models: MobileNet, VGG16, InceptionV3, and ResNet.These pre-trained models are usually trained on large-scale image datasets such as ImageNet, which speed up the training process and improve the models' performance by keeping general feature extraction capabilities.These four specific models are selected because of their various sizes and portability.Thus, comparing them offers an approximate sense of how much the models' sizes and portability might impact the effect of Transfer Learning (TL) on them.

MobileNetV2
MobileNet is designed with a focus on efficient and lightweight computations, making it particularly well-suited for mobile and embedded devices [22].This efficiency is achieved by using depthwise separable convolutions, which divide the typical convolution operation into two steps: a depthwise convolution followed by a pointwise convolution.As shown in Fig. 1, in each depthwise convolution step, individual spatial channels are convolved with separate kernels, and in the subsequent pointwise convolution, 1x1 convolutions are applied to merge these channels.This separation of spatial and crosschannel information contributes to streamlined computations.This approach significantly reduces the number of computations while maintaining a good level of accuracy [23].
As shown in Table 2, MobileNet also integrates depthwise separable convolutions with Batch Normalization, which enhances training stability and accelerates convergence.This further contributes to the model's suitability for real-time and resource-constrained scenarios.MobileNetV2's architecture is complemented by various hyperparameters, such as width multiplier and resolution multiplier, which allow for trade-offs between model size and accuracy based on specific requirements.This adaptability makes MobileNetV2 suitable for a wide range of applications, from mobile devices to embedded systems and IoT devices.

VGG-16
VGG-16 is a relatively deep neural network with 16 convolutional and fully connected layers.This depth helps in extracting and learning more complex feature representations.[24] VGG-16 uses relatively small 3×3 convolutional kernels, which reduces the number of parameters, increases the network's nonlinearity, and allows for stacking multiple layers for increased depth.
The network employs repeated blocks of convolutional and pooling layers, gradually capturing features of different levels in the image.It uses max-pooling layers to reduce the size of feature maps, decreasing computational load while retaining important features.After the convolutional and pooling layers, VGG-16 includes several fully connected layers that map feature representations to probability distributions over different classes.In table 3 and Fig. 2, the VGG-16 network architecture can be divided into 4 parts: 1. Input Layer: The input is a 224×224 color image (RGB channels).2. Convolutional Blocks: VGG-16 has 5 convolutional blocks, each comprising a series of convolutional and max-pooling layers.The convolutional layers use small 3×3 kernels followed by a ReLU activation function.The number of convolutional layers increases from shallow to deep layers in each block.3. Fully Connected Layers: After the convolutional layers, VGG-16 has 3 fully connected layers, each with a ReLU activation function.The final fully connected layer outputs probabilities for different classes.4. Softmax Layer: After the last fully connected layer, the Softmax activation function is applied to convert the outputs into class probabilities.
In the structure above, the notation within parentheses indicates kernel size and output channel number, and "× 2" signifies repetition twice [26].The final fully connected layer consists of 1000 neurons, corresponding to the 1000 classes in the ImageNet dataset.Typically, VGG-16 is used as a feature extractor or as a base network for fine-tuning in practical applications.

Inception V3
Unlike traditional deep learning network models such as AlexNet and VGGNet, GoogLeNet is a deep learning network model based on Inception modules and has 22 layers (referring only to layers with connected weights).The Inception module is primarily composed of multiple filters or convolutional kernels of different scales stacked together to increase the network's width.GoogLeNetV1 (InceptionV1) is constructed by stacking alternating convolutional layers, pooling layers, and InceptionV1 modules [27].It has around 5 million training parameters.With the continuous improvement of the Inception module, it gradually evolved into network structures like InceptionV2, InceptionV3, InceptionV4, and Inception-ResNet.
InceptionV3 (the model used in this article) builds upon the initial Inception module.To reduce the computational parameter count and enhance model speed, it decomposes a 5×5 convolutional kernel into two 3×3 convolutional kernels and a n×n convolutional kernel into two 1×n and n×1 convolutional kernels.Additionally, InceptionV3 employs Batch Normalization to assist the classifier, thereby accelerating classification speed.It also utilizes label smoothing to prevent overfitting [28].The InceptionV3 module is illustrated in Fig. 3.In this diagram, module a comprises an Inception structure with two consecutive 3×3 convolutional kernels, module b employs consecutive 1×n and n×1 convolutional kernels in place of a n×n convolutional kernel, and module c substitutes parallel 1×3 and 3×1 convolutional kernels for the 3×3 convolutional kernel.The specific network structure of InceptionV3 is provided in table 4.

ResNet9
ResNet9, a variant of Residual Networks (ResNet), has emerged as a significant architecture within deep learning, particularly in the domain of image classification [29].ResNet architectures are known for their deep structures as shown in Fig. 4, and ResNet9 represents a specific configuration that maintains the fundamental principles of residual learning while offering a more compact design.The primary innovation in ResNet architectures lies in the introduction of residual connections or skip connections.These connections allow the output of one layer to bypass one or more intermediate layers and connect directly to the output, as illustrated in Fig. 5 for ResNet9.This facilitates the backpropagation of gradients and helps mitigate the vanishing gradient problem, a common challenge in training deep neural networks [31].[32] In ResNet9, the network is organized into blocks, each consisting of a specific sequence of layers.A typical block includes a convolutional layer followed by batch normalization, a ReLU activation function, and then another convolutional layer.The residual connection adds the input of the block to the output of the last convolutional layer.This structure allows the network to learn residual functions, essentially focusing on learning the difference between the input and output of a block, rather than the complete mapping.

Figure 5. Skip connections in ResNet9
The use of 3x3 convolutional kernels in ResNet9 reduces the number of parameters, enhancing computational efficiency.The architecture also involves pooling layers, allowing for a gradual reduction in spatial dimensions as the network deepens, and fully connected layers towards the end of the network [31].
One of the main advantages of ResNet9 is its adaptability.The architecture can be tailored to various applications by adjusting hyperparameters such as the number of blocks, the number of convolutional filters, and the use of additional mechanisms like weight decay or gradient clipping.In our implementation, we leveraged these characteristics to process a dataset of plant images, applying various image transformations and training strategies to achieve effective disease classification.
In summary, ResNet9 offers a balanced trade-off between complexity and computational efficiency.Its design, rooted in the concept of residual learning, enables the training of robust models capable of capturing intricate patterns within images.The flexibility and depth of ResNet9 make it a suitable choice for a wide range of tasks, extending from simple image recognition to more complex analyses in various domains, including agriculture.

Transfer Learning
To apply traditional machine learning methods, there are two preliminary conditions: first, the training data and the test data are assumed to be of the same distribution; and second, a large amount of labeled data is available.However, the fundamental issue in practical application is that there is a tremendous amount of high-quality, labelled source data but very few target data.In this case, the transfer learning approach may be utilized to realize target data recognition and classification with supplementary training on the source data.
Transfer learning is a machine learning approach that applies knowledge gained from one task to another related task in order to improve performance.A model learns from one source domain and transfers its knowledge to a different target domain in transfer learning.There is usually some resemblance or correlation between the source and the target domain.By utilizing information from the source domain, the learning process of the target domain may be accelerated and the model's performance can be enhanced.Transfer learning involves various techniques such as feature extraction, fine-tuning, and model adaptation.In this article, transfer learning is implemented by employing finetuning technique.
In our approach, we start with a model trained on the source domain and then train it further on data from the target domain.Transfer learning speeds the learning process in the target domain and enables the model to converge more quickly.Because the data in the target domain, namely labelled leaves images in the wild in our research, is limited, transfer learning plays an essential role in helping us train the model and enhance its performance.Domain Adaptation Domain adaptation(DA), a subset of transfer learning, refers to the task of adapting a model from a source domain to a target domain.In DA, the source and target domains usually have distinct data distributions, which may cause degradation in model performance on the target domain.The objective of DA is to improve the model's performance on the target domain by utilizing the knowledge of the source domain, bypassing the necessity for extensive data collection and labeling in the target domain.This is especially useful when data is scarce or expensive in the target domain.Domain adaptation strategies aim to align the distributions of the source and target domains, either by minimising the distribution discrepancy or by adjusting the model's parameters.Common DA methods include: feature selection and transformation, inter-domain adversarial training and generative adversarial networks (GAN), which are of great significance in many practical applications, such as object detection in computer vision, image classification.In tasks such as face recognition and face recognition, DA has the potential to enhance the model's generalization capacity and performance across in different situations and environments.

Experiments
This section comprehensively explains how the proposed DCNN models were constructed and trained, along with details about the experimental setup.
The proposed workflow for plant leaf disease detection begins with data preparation and finishes with model prediction.Each phase of this workflow is addressed in detail in a corresponding subsection.The next subsection discusses the mechanics behind dataset preparation and data preprocessing.The optimization strategies used to train each model follow.Finally, the results of applying different optimization methods to distinct models are summarized and analyzed.

Dataset & Environment Setup
Tasks involving data preparation, preprocessing, model design, and model prediction are conducted using Kaggle and Colab platforms, configured with a 2-core CPU and 13GB of random access memory (RAM).Python 3.9.13programming language, TensorFlow 2.13.0,Pytorch 2.0.1, numpy Version 1.21.5, and matplotlib Version 3.5.2are used for during the implementation of the proposed workflow and DCNN models.
The training and testing process of the proposed networks and optimization techniques were performed using Kaggle and Colab platforms.These platforms provide powerful computational capabilities, including a 2-core CPU and 13GB RAM, along with P100 and T4 GPUs for expediting the training of DCNN models.
Images of diseased and healthy leaves are acquired from several public data archives, which are divided into two categories: lab data, which is collected in a controlled laboratory setting, and field data, which is manually captured with varied lighting, angles, surfaces, and noise.The New Plant Disease Dataset from Kaggle [33] is made from the Plant Village Dataset by data augmentation, methods of which include flipping, rotation, scaling, zooming, colour jittering as well as brightness and contrast adjustment.
The Plant Village Dataset [34] consists of 14 distinct plant species, each with both healthy and diseased classes.The 54,303 healthy and unhealthy leaf images in Plant Village produce 87,000 augmented images, with relatively even distribution among different classes.These images are divided into training and validation sets with an 8:2 ratio.This dataset is critical in preliminary experiments, where models are selected and optimisation and other strategies are tested.After the trials, as a result of limited field data obtained, the consequent training and experiments focus on a specific species: Apple.The list of class names and sample images in the proposed dataset is shown in table 5 and Fig. 6 respectively.The field dataset we find is obtained from the Plant Pathology Challenge 2020 [35].A total of 3,651 high-quality, real-life symptom images of multiple apple foliar diseases, along with healthy apply leaves, are manually captured in apple orchards in the U.S. To make this data set comparable to the previous one, we also divide it into training and validation sets according to 8:2 ratio.The content of the dataset is summarized in table 6 and Fig. 7.  -Step Decay: The learning rate is reduced by a factor after a fixed number of epochs or steps.This allows for larger initial learning rates that are then reduced as the optimization progresses.

Training & Implementation Procedures
-Exponential Decay: The learning rate is reduced exponentially after each epoch or step, leading to a gradual decrease.
-Cosine Annealing: The learning rate follows a cosine function, which causes the learning rate to oscillate between higher and lower values.This can help the optimization escape local minima.
2) Regularization: Regularization techniques are used to prevent overfitting by adding constraints to the optimization process.Overfitting occurs when the model fits the training data too closely and performs poorly on new, unseen data.Two common regularization methods are: -L1 and L2 Regularization: L1 regularization incorporates the absolute weights into the loss function, promoting model sparsity, while L2 regularization adds squared weights to penalize larger weights, thus preventing excessively intricate models.
-Dropout: Dropout is a technique where randomly selected neurons are ignored during training with a certain probability.This forces the network to learn more robust features by preventing any single neuron from becoming overly specialized to a particular feature.
3) Normalization: Normalization techniques aim to ensure that the input features or activations of the network have a consistent scale, which helps in faster convergence and better generalization.Common normalization methods include: -Batch Normalization: Batch normalization normalizes the activations within a layer over a minibatch of samples.It helps stabilize the optimization process by reducing internal covariate shifts and can improve the training of deep networks.
-Layer Normalization: layer normalization normalizes a layer's activations like batch normalization, instead of the batch dimension, it functions across the feature dimension.This makes it more suitable for recurrent networks and situations where batch sizes are small.
-Instance Normalization: Instance normalization is similar to batch normalization, but it normalizes the activations of each instance (sample) separately.It's often used in style transfer and generative models.4) Early Stopping: Early stopping monitors a validation metric and halts the training process when the performance ceases to improve.This method helps to prevent overfitting and save computational resources.Key aspects include: -Validation Monitoring: Tracks performance on a validation set to gauge generalization.
-Patience Parameter: Defines the number of epochs without improvement before halting training.
-Best Weights Restoration: Option to save and restore the model's best weights.
-Usage: Effective across various neural networks, aiding in efficient training and robust generalization to unseen data.

MobileNetV2:
During the training process, the parameters in the feature extraction layers (all the convolutional layers) are frozen and only the weights of the classification layer are constantly updated.The batch size was tuned to 50 and the probability of dropping out was first set to 0.5.The learning rate was first set to 0.001.And the training stages were finalized and set to 20 epochs.
The model is trained based on two different strategies.The first methodology simply trains two models based on the two different datasets.The two models are then integrated linearly with a weightage based on the difference in size of the two datasets.A couple of optimization methods are tested and the combination that yields the best results (76.59%) occurs when the learning rate decays exponentially with batch normalization.
The second methodology employs domain adaptation where data from source and target domain are used jointly in the training process.The model is trained to output both the class label and the domain label, with the aim of minimizing domain discrepancy by adjusting the model's weights.The phenomenon that the test accuracy falls while training and validation accuracy keeps increasing during training indicates a problem of overfitting.Therefore, the probability of dropping out is increased to 0.7.As illustrated in table 7, the model trained using this method was able to achieve a slightly higher test accuracy of 76.70%.However, the model trained using DA performs much better than the integrated model in general as it achieves a relatively decent accuracy on both the lab and field test set, thus indicating its higher robustness to environmental variability.9.Among them, the first method adopts the model training mechanism of direct training.The improved InceptionV3 model combined with migration learning method and data enhancement strategy can improve the recognition accuracy of two different environmental data set diseases, and can alleviate the over-fitting phenomenon in the training process.The recognition accuracy of the original training method in the laboratory environment is only 69.22%, while the recognition accuracy of the improved InceptionV3 combined with migration learning method for plant diseases can reach 94.35% in the laboratory environment and 73.73% in the field environment.Therefore, the feature extraction ability of InceptionV3 in large-scale image data sets is strong, and it can be applied to specific task training. ResNet9: In the context of this experiment, two distinct methodologies were employed to train the model to fulfill our targeted task, with the results illustrated in table 10.The first approach involved model ensemble, utilizing two models to train both the source domain (laboratory images) and the target domain (field images).Subsequently, these models were integrated through a weighted allocation, based on the size of the datasets, to execute predictions.The second approach encompassed joint training, wherein both the source and target domains were concurrently trained on a pre-trained model, allowing the model to independently extract and combine the relevant features.Additionally, the prediction results of the initially trained models were provided for comparison.Notably, the original models, especially the lab model, demonstrated commendable prediction accuracy within their corresponding test sets.In contrast, the field model exhibited moderate prediction performance on its test set.However, the lab model's predictive capability on the target domain (field) test set was markedly inferior.The findings clearly indicate a substantial improvement in the prediction results of the target domain's test set, regardless of whether model ensemble or joint training was employed.These techniques facilitated predictions for the target domain test set that closely approximated or even surpassed the field's original model's performance on its corresponding test set.This achievement represents a significant advancement in the field, demonstrating the effectiveness of integrating domain-specific models and joint training in enhancing prediction accuracy for diverse and challenging datasets.As illustrated in Table 11, the experimental results shed light on the initial performance of four distinct models, each trained on the source domain (lab), in predicting the target domain (field) test set.Notably, the accuracy of these preliminary predictions was rather unsatisfactory, failing to surpass the 60% threshold.Such an outcome prompted a thorough investigation into various optimization strategies tailored to each model.By meticulously applying a combination of techniques and iterative refinements, substantial improvements were achieved in the prediction accuracy for all four models.This enhancement not only underscores the importance of fine-tuning in model development but also demonstrates the potential for adapting models across different domains to achieve coherent and reliable predictions.The evaluation revealed that ResNet9 outperformed the other three models, achieving the highest accuracy on both the validation set (at 99.1%) and the test set (at 86.4%), thus surpassing previously reported methodologies.

Outlook & Further Exploration
Several avenues present themselves for further exploration and development within our research.
Incorporation of Additional Data Sources: By systematically integrating data related to climatic conditions and soil information, we can delve deeper into the multifaceted dynamics of plant diseases.These additional dimensions allow for a more nuanced understanding, connecting weather patterns, soil quality, and other environmental factors to disease propagation.This holistic approach promises to significantly enhance the predictive accuracy and robustness of our models, paving the way for more informed agricultural decision-making.
Enhancement of Preprocessing Abilities: The extension of our preprocessing capabilities is a critical step towards more complex and detailed pathological analyses.By accepting various image types, including grayscale, infrared, and others, we can explore different visual characteristics and spectral properties of plant diseases.This broadened scope allows for more comprehensive examinations, potentially uncovering novel insights into disease manifestations and behaviors.Further research in this area could lead to more precise detection methods and contribute to the development of innovative diagnostic tools.
Development of User-Friendly Tools and Platforms: Our vision extends beyond academic research to the creation of practical, user-friendly tools that can be readily applied in the agricultural field.By leveraging detective cameras and online platforms, we aim to translate our research into tangible solutions for farmers and agricultural organizations.This could involve real-time disease detection, automated reporting, and the provision of targeted interventions.Such a system would not only empower farmers with the knowledge to address plant diseases promptly but also foster collaboration between researchers, practitioners, and policymakers in the pursuit of sustainable agriculture.
These directions align with our commitment to advancing intelligent and sustainable farming practices and bear significant implications for global food security and environmental stewardship.We hope to make a contribution to a broader discourse in the future, recognizing the complexity of modern agriculture and emphasizing the need for comprehensive solutions that serve both humans and the environment.

Conclusion
In this study, the exploration of a transfer learning method was undertaken to devise a strategy for identifying and detecting diseases in plant leaves.A comparative analysis of four pre-trained models was conducted, focusing on their application as tools for feature extraction.To mitigate the issue of imbalanced data, image preprocessing and augmentation techniques were utilized.Experiments were carried out using these models on a dataset consisting of over 9000 images across 4 different classes, representing healthy and afflicted plant leaves, in addition to a more specific dataset in the target domain containing above 4000 images in 4 categories.
We have employed a variety of optimization techniques to improve the prediction accuracy of various models on the target domain test set.For instance, hyperparameter tuning played a pivotal role in finetuning the model's accuracy.This encompassed adjustments to learning rate scheduling, weight decay, cosine annealing, batch size, epoch number, regularization techniques, and more.Such meticulous tuning and methodological combinations necessitated substantial time and effort but were instrumental in achieving optimal performance and thwarting overfitting.
Furthermore, the incorporation of techniques like transfer learning and domain adaptation markedly increased the model's adaptability, thereby improving its predictive capability across diverse domains.These methods endowed the model with robustness, enabling satisfactory predictive outcomes across different contexts.
Complementary strategies such as early stopping, process monitoring, and visualization facilitated a more nuanced adjustment of the model.By conducting systematic inspections at each iteration, the likelihood of overfitting was significantly minimized.
Our best-performing model, ResNet9, scored a test accuracy of 86.4%, which was a 30.8% improvement over its previous performance.This demonstrates that the aforementioned measures, when applied collectively, contributed to a comprehensive optimization process, paving the way for a model that not only excels in accuracy but also exhibits versatility and resilience against common challenges in machine learning.

Figure 6 .
Figure 6.New Plant Diseases Dataset -Apple: Sample images

Figure 9 .
Figure 9. Training Specifications of each model Inception V3: Since the images in the test set are not involved in the training of the model, the accuracy of the model on the test set is selected as the evaluation standard.When defining the top of the model, the output of base__model is connected to the global average pooling layer, and then feature extraction is carried out through a fully-connected layer with 256 neurons, and the Softmax activation function is used to generate category prediction.During training, pre-trained layers are frozen to prevent the weights of these layers from being updated during the training process, so that only the weights of the top parts can be updated during the subsequent training process.The specific data of the original model and the improved InceptionV3 model under different training mechanisms are shown in Table

Table 1 .
State of the art

Table 5 .
New Plant Diseases Dataset -Apple: Summary

Table 6 .
Plant Pathology Challenge Dataset -Apple: Summary

Table 8 .
VGGNet16 test resultsAmong the 8 experiments, the third group showed that the accuracy of the training set reached 91.41%, while the crossover set remained at 81.07% and did not increase.The accuracy of the test set was only 69.30%, indicating overfitting of the data.After testing, the test set obtained by only training all layers of the model had the best performance.

Table 11 .
Comparison of results from different models