Research on crop disease recognition based on Multi-Branch ResNet-18

Traditional image processing has many problems in crop disease identification, such as complicated manual design and low efficiency. This article studies the application of deep learning algorithms in the identification of crop diseases. In this article, attention mechanism and feature fusion are introduced to optimize ResNet-18, and for the problem that the network has only a single output, based on the optimized ResNet-18, a Multi-Branch ResNet-18(MB-ResNet-18) and a joint loss function are proposed to achieve the simultaneous classification of crop-type level, disease-type level, and disease-degree level. The experimental results show that, compared with ResNet-18, the network structure proposed in this article basically maintains the classification accuracy of crop-type level and disease-type level, and the classification accuracy on crop disease-degree level has increased by 2.11%.


Introduction
China is a large country with a population of 1.4 billion, food security is fundamental to the national economy, people's livelihood. With the continuous development and progress of information technology, the development of big data technology and high-performance computing hardware has provided a new development direction for the development and transformation of modern agricultural technology, integrating artificial intelligence and big data and other high-tech in traditional agricultural production, "smart agriculture "It came into being. The basic process of traditional computer vision to identify the types of crop leaf diseases is: extracting crop leaf images or diseased areas from original images; extracting features such as edges, colors, and textures in the crop leaf images or diseased areas; manually selecting and designing effective feature variables; using traditional computer vision method builds a classifier to realize the recognition of crop leaf diseases. However, this method requires a lot of manpower, is inefficient, and has low accuracy.
With the rapid development of computer vision and the neural network, the research on crop leaf disease recognition algorithms based on computer vision and the neural network has begun to receive extensive attention from scholars at domestic and abroad [1]. In domestic research, Sun Jun et al. [2] used batch normalization and global average pooling to improve the AlexNet model and trained on the Plantvillage data set, which improved the problem of slow convergence and a large number of parameters of the AlexNet model and improve the accuracy of the model's prediction. Zhang Jianhua et al. [3] froze part of the layers of the pre-trained VGG convolutional neural network and fine-tuned the remaining layers to classify the five diseases and normal leaves of cotton, and the recognition accuracy reached 89.51%. In abroad studies, Sjadojevic et al. [4] used a convolutional neural network  [5] used pre-training fine-tuning AlexNet model to identify 26 diseases of 14 crops in PlantVillage and achieved a classification accuracy of more than 90% on the test set.
The existing algorithms for identifying crop leaf diseases based on neural networks are mostly only for identifying a few types of diseases, and cannot distinguish the degree of disease, and it is difficult to provide a reliable reference for intelligent pesticide application. Most neural network algorithms can only predict the label of one level of disease, and cannot achieve multi-level prediction of crop-type level, disease-level, and disease-degree level. In this article, contrary to the shortcomings of the existing research on crop leaf disease identification, based on ResNet-18 [6], the MB-ResNet-18 network is proposed, which uses different feature maps to predict the crop-type, disease-type, and disease-degree respectively, and proposes a joint loss function, to train the MB-ResNet-18 network, achieved better classification accuracy in the final classification task.

Structural design of MB-ResNet-18
This article proposes a new ResNet-18 network model based on multi-branch, which simultaneously predicts the multi-level labels of the image through different feature maps in the network model.

Network structure design
The convolutional neural network has several convolution kernels in each convolutional layer, which can learn the local spatial connection mode of all channels. That is to say, all the convolution kernels in the convolutional layer extract the fusion information of the space and channel in the local receptive area, plus the nonlinear activation layer and the down-sampling layer, CNN can obtain the hierarchical pattern with the global receptive area as the description of the image. Some works have shown that the performance of the network can be improved by adding a learning mechanism that helps to obtain the correlation of feature maps. In the process of evaluating disease-degree, it is often necessary to evaluate the disease-degree with tiny features. Therefore, in this article, we choose to add the attention mechanism in SE-Net [7]. By integrating the global information of each channel of the feature map extracted by the convolutional layer, the weights of each channel of the feature map are rescaled, and the importance of the feature map channels is distinguished by different weights, obtain the channels that the network model needs to pay attention. The channel feature can adaptively change the weight of each channel. By rescaling the importance of the feature map channel, the classification accuracy of the assessment task of the severity of crop diseases can be improved. The structure of the attention mechanism is shown in figure 2. Figure 2. Structure of attention mechanism. In addition, the feature maps extracted by layer2, layer3, layer4, and layer5 are merged across layers through a channel splicing method to make the final feature representation as rich as possible, and further improve the classification accuracy of the crop disease-degree assessment task. On this basis, additionally selects the feature output of layer3 and layer4 as nodes, adding two longitudinal connections. After global pooling, the two groups of feature outputs are sent to the classifier to perform classification tasks of crop-type and disease-type. The specific structure of MB-ResNet-18 is shown in figure 3.

Joint loss function design
Literature [8] pointed out: "Multi-task learning improves generalization performance by using domain knowledge contained in the supervision signals of related tasks." The hard sharing mechanism of parameters is the most common way of multi-task learning in neural networks. And literature [9]  proves that the more tasks learn at the same time, the more tasks the trained model can capture the same representation, which leads to the smaller the risk of overfitting on the original task.
Based on this, this article combines the loss functions of the three classification tasks into a joint loss function and uses the correlation between tasks to promote each other. While the network can use the feature maps of different convolution groups to predict different levels of labels, the recognition accuracy of the model is improved, and speed up the convergence of the model. The expression form of the joint loss function is shown in formula (1).
Among them, represents the proposed joint loss function, α, β, and γ are the hyperparameters of the improved joint loss function, which are used to adjust the proportion of the loss function; and are the cross-entropy loss functions, which are used to calculate the loss of output of Layer3 and Layer4 output; is the MC-Loss loss function, and are the discriminative component and the diversity component, respectively, see formula (2) for details.

Data sets introduction
This article uses part of the data of the crop leaf disease data set in the 2018 AI_Challenger, which contains a total of 9 kinds of plants, 15 kinds of diseases, and the disease distinguishes two kinds of disease levels and a total of 39 types of data. The training set contains 16,931 images, and the validation set contains 2716 images. The training set and the validation set are mixed and randomly divided into ten parts, 9 of which are used as the training set, and the remaining one is used as the validation set. Part of the experimental data is shown in figure 4.
To verify the MB-ResNet-18 proposed in this article, the image's label is expanded from a single label to a form of multi-level classification of crop-type level, disease-type level, and disease-degree level. The data set at the crop-type level consists of 9 types of labels, excluding the disease-type; the data set at disease-type level is composed of 24 types of labels, including 15 types of disease labels and 9 types of health labels, which do not distinguish the degree of disease; The disease-degree level consists of two types of labels: general and severe, which do not distinguish disease-type level.
The original data set has a total of 19647 images, but the data distribution is extremely unbalanced. Among them, the potato health category with the largest number has a total of 1493 images, while the cherry powdery mildew general and severe categories have only 119 images respectively. To alleviate

Model parameter setting
The computer operating system for this experiment is a 64-bit system of Ubuntu 20.04, and all programs are written in the Python language under the Pytorch framework. Computer memory is 32GB, equipped with Interl® Core™ i5-8500 CPU @ 3.00GHz processor and NVIDIA RTX3090 24GB graphics card. Hyperparameters play a very important role in the training of the model. In this experiment, Adam optimizer is selected to update and calculate network parameters. Use L2 regularization and Dropout method to reduce the over-fitting problem of the network model, where the regularization parameter is 0.0001, and the Dropout parameter is set to 0.5; The initial learning rate is set to 0.001, and the learning rate drops ten times every 15 rounds; The network model uses a small batch training method, and the batch size is set to 64, and the number of epochs is set to 100; α, β, and γ are set to 1 in the joint loss function.

Experimental results
This article uses accuracy to evaluate the performance of the model. The accuracy index is defined as follows:

100% 3
where: Analysis of the experimental results shows that the prediction accuracy rate of the MB-ResNet-18 network at the crop-type level is slightly lower than the original ResNet-18 network. This is because MB-ResNet-18 is used the feature output of the shallow convolution group when predicting the croptype level, and the prediction accuracy of the disease-type level and the disease-degree level are higher than ResNet-18. BS-ResNet-18 has a prediction accuracy rate of 93.97% and 89.99% for the two levels of disease-type level and disease-degree level, respectively. Compared with the original ResNet-18, the prediction accuracy rates of the two levels are increased by 0.51% and 2.11%, respectively.
To verify the stability of the prediction accuracy of BS-ResNet-18, an experiment was performed using ten-fold cross-validation. The data set samples are randomly divided into ten parts with the same amount of data. Each training model uses nine of the data sets as the training set, and the last one as the validation set. The experimental results in table 1 use data from data set 0 to data set 8 as the training set, and data set 9 as the validation set.
The results of the BS-ResNet-18 network ten-fold cross-validation are shown in table 2. The lowest accuracy rate of crop-type level is 95.47% appear in the data set 9, the highest accuracy rate of croptype level is 96.55% appear in data set 2. The lowest accuracy rate of disease-type level is 92.79% appear in data set 0, the highest accuracy rate of disease-type level is 94.44% appear in data set 1. The lowest accuracy rate of disease-degree level is 89.14% appear in data set 5, the highest accuracy rate of disease-degree level is 91.45% appear in data set 8. The fluctuation of the accuracy rate is relatively stable.