Optical-imaging detection of apple sugar content based on OpenCv and Pytorch

Based on the Rayleigh scattering and Mie scattering, a scattering model in the apple pulp tissue is theoretically established, and the relationship between the sugar content of the apple pulp and the light intensity distribution of the diffuse reflection image is derived.Firstly, a homemade reflection-collection system has been applied to collect the scattering light intensity distribution of 133 apples with different varieties, skin colors, and sugar contents. According to acquired optical images 133 original images are processed based on OpenCv, and the appropriate region is extracted to recreate the image set, and the image features of the image set are derived with the aid of the denoising self-encoding neural network for data dimensionality reduction. The image feature and the recorded fructose content table are made into a data set, which is divided into a training set and a test set at a ratio of 3:1. Finally, through the training of the neural network, the non-destructive prediction of the apple sugar content is successfully achieved within a certain range.


Introduction
In the past decades, various non-destructive testing techniques have been presented for predicting fruitrelated sugar content, most of which are based on the spectral analysis of substance types and quality, including UV, vis, and NIR spectroscopy [1]. Specifically, near-infrared (NIR) spectroscopy technology is commonly used in determination of apple varieties and sugar content [2]. Raman spectroscopy is also one of the methods for non-destructive testing of fructose content [3].The testing ideas of above methods are inseparable from the scope of spectroscopy,in addition to the low-field magnetic resonance technology [4] fruit non-destructive testing, other.Recently, combining non-contact optical imaging and data analysis has received more and more concerns and commercial development for sugar content detection. [5] Most of them are inseparable from the principle of hyperspectral imaging due to its unique advantages. For example, the collected image samples must contain enough information related to the content and concentration of fructose. The excess information is also used for the analysis of hyperspectral imaging data. A double-edged sword. Although the acquired sample image set contains enough effective information, the difficulty about how to extract the features related to the effective image information from a series of pictures still remain greatly because the continuous spectral image leads to excessive imaging information. A vast of data in hyperspectral imaging lead to the dimensionality reduction of mathematical modeling and the optimization of model speed are inevitable difficulties.
In this article, we propose a method for non-destructive detection of apple sugar content by analyzing the diffuse reflection light intensity distribution of the apple surface image under a single-wavelength light source. According to the optical principle of biological tissue scattering, a theoretical mathematical model of diffuse reflection light intensity and fructose concentration was established, and then an experimental light path was designed. Image information of 133 apple samples and the juice from the top of the apple corresponding to the image number were collected. The percentage of carbohydrate content. Using OpenCv to preprocess 133 data images, the function of automatically identifying the region of interest in the image and dynamically extracting and storing the function is realized, and the extracted ROI size and area are reasonably optimized according to the theory of neural network. Using PyTorch, an autoencoder neural network that can extract the image light intensity distribution characteristics of the above ROI area is established. According to the structural difference between the data set collected in the experiment and the traditional neural network training data set, customized and compiled a data loader for this experimental data set and trained a linear neural network that reasonably utilizes image features, and successfully achieve non-destructive prediction of apple sugar content within a certain range.

Establishment of a biological model of pulp cell
The pulp cells and organelles and many biological macromolecules that can interact with light are all spherical, so the pulp tissue can be modeled as a discrete uniform spherical particle aggregate with a refractive index different from the surrounding environment (cell wall, tissue fluid, etc.). Calculating the scattering characteristics of this pulp tissue model requires combining Rayleigh scattering theory and Mie theory. The scattering coefficients of discrete particles in biological tissues that meet the Rayleigh scattering conditions are expressed as follows： In the above formula, s  , s n and m n are the scattering coefficient, particle refractive index and medium refractive index, respectively. Then set the refractive index of the solution to be mc n when the fructose concentration is c . The refractive index caused by the change of fructose solution is relatively small, which can be roughly regarded as:   In summary, the change of fructose concentration can simultaneously change the absorption coefficient and scattering coefficient in the optical parameters of biological tissues. From the above equation (4), it can be seen that the change of fructose concentration will cause the diffuse reflection lightintensity to change. The change in the light intensity of the upper image has a linear relationship with the fructose solution. When the concentration of fructose increases, the backscattered light intensity will correspondingly decrease.

Image acquisition experiment
The image acquisition device is mainly used to acquire the image of the part to be measured on the top of the apple. The schematic diagram and physical map are shown in Figure 1 .A 650nm single-band laser source is used as the illumination light source, and an aperture-adjustable color CCD with a resolution of 768×576 is used for image acquisition. By maximizing the illuminating light intensity within the acceptable threshold range of CCD and collecting images at night, to reduce the interference of ambient light in the image acquisition process, and then adjusting the aperture of CCD to further make the reflected light intensity distribution at the top of the apple. It is more prominent in the collected images. In the process of setting up the experimental environment, keep the CCD and the laser light source at the same level, and keep the connection angle between the sample and the light source and the connection angle between the sample and the CCD around 45°.
133 apples of different types, sizes and peel colors were used as samples for the corresponding collection experiments of images and sugar content. After 133 sample numbers, first collect the apple images under 650nm laser irradiation, and then calculate the percentage of fructose content in the juice according to the number of slices on the top of the sample.

Image processing with OpenCv
Even the image extracted by the CCD digital noise reduction and aperture adjustment cannot be directly used in the mathematical model established by the computer. The reasons are as follows:  In the case that a completely dark room cannot be guaranteed, the background imaged under different lighting conditions and different lighting conditions is different, and the noise in the background has different effects. If all the information in the image is input into the neural network to extract image features, there will be a large number of features of invalid background and noise are extracted.
 Even if the input image is grayscaled and converted into a large matrix input noise reduction self-encoding neural network, the input data amount is too large and it takes a lot of time; and because the background contains too much irrelevant information about the apple diffuse reflection light intensity distribution.
 Although the place where the apples are placed is fixed, but the radius of each apple is different, the position of the diffuse reflection surface to the CCD is also different. The CCD receives the strongest position of the diffuse reflection light in the image of different sizes of apple samples in the screen. The distribution is also different.

Extraction of the complete spot image
We need to let the computer dynamically select the area of diffuse reflection light intensity distribution in each picture and save it in another folder and rename it to the corresponding code. This process is called ROI extraction. The size is too large and affects the extraction and calculation speed of the effective image features of the neural network. After observing the size of the light intensity distribution in the target area in Figure 133, it is found that the 150×150 size output dynamically selected ROI image can completely contain the entire light spot. With the help of OpenCv image processing, write the code to find the point with the highest gray value in each image from top to bottom and from left to right, that is, the point with the highest light intensity, and take this point as the center of the apple surface light intensity distribution map. The 150×150 image is processed and the number corresponding to the original image is saved in another folder named ROI. Take the four images in Figure 3(a) as an example: After ROI extraction based on OpenCv, the images with a size of 150×150 are displayed in order as shown in Figure 3(b), and they are stored in order.

Optimization of ROI selection
The 150×150 ROI still has the background noise shown in the red box as shown in Figure. 3(b). Use OpenCv to sequentially output the value and number of the highest point of the gray value of the light intensity distribution map of 133 images. In the output result, it is observed that the maximum value of the gray value of most images is 255. The reason is that the power of the laser light source is too strong. The value of the strongest pixel in the apple diffuse reflection light intensity distribution image has exceeded the threshold value that the CCD can accept and represent. In summary, the 150×150 size spot image not only contains the surrounding background noise, but also the center of the spot. There are areas where the diffuse reflection light intensity exceeds the CCD receiving threshold and becomes invalid information. Therefore, we must appropriately reduce the size of the ROI and select a suitable area to eliminate the invalid area of some pixels in the center of the 133 pictures and the background noise around the light spot, so that the ROI area we obtain is as effective as possible information about the light intensity gradient distribution.The principle is shown in Figure 3 (d) and (e). Assuming that the red line in the Figure 3(d) on the right passes through a certain point with the highest gray value of diffuse reflection light in the circular spot, the red line is used as the abscissa, and the gray value corresponding to the pixel value is plotted on the ordinate. The code result is verified. There is an invalid information part in the center of the spot that exceeds the CCD receiving light intensity threshold. As shown in the schematic diagram of Figure 3(e), only the red part is the ROI area we need, that is, the diffuse reflection light intensity gradient distribution areaTherefore, the size of the ROI is changed, the images with each size of 24×24 are displayed in order as shown in Figure 3(c) and stored and the code extraction idea is as Table 1 (a).

Model training by PyTorch
Each figure should have a brief caption describing it and, if necessary, a key to interpret the various lines and symbols on the figure.

Data set production
One of the core questions is: how to convert the interface of a custom data set to an interface suitable for Autoencoder. In order for the data obtained from the experiment to be successfully imported into the built neural network model for training, it is necessary to customize an input and output format of "picture"-"value" based on the data loading function of PyTorch according to the characteristics of this data set. Loader. The idea of writing a custom data loader is as follows: (1) Use the program to save the path of each ROI picture into a long list list1, and name the long list list1 as file; (2) Save the value corresponding to each number in the table that records the percentage of fructose content in the 133 sample juices in the long list corresponding to the order of list1 in the previous step, and name list2 as labels; (3) Divide the above two lists into training set and test set according to the ratio of 3:1, and bring in the above two lists through PyTorch's Dataset library customization method, and successfully customize the data set.

Autoencoder
The Autoencoder model framework is shown in Figure 2. The complete code design and execution process is as  Figure 1.the schematic diagram and physical map Figure 2. Schematic diagram of autoencoder framework.