Oil palm fruit grading using a hyperspectral device and machine learning algorithm

In this paper, a hyperspectral-based system was introduced to detect the ripeness of oil palm fresh fruit bunches (FFB). The FFBs were scanned using a hyperspectral device, and reflectance was recorded at different wavelengths. A total of 469 fruits from oil palm FFBs (nigrescens, virescens, oleifera) were categorized as overripe, ripe, and underripe. Fruit attributes in the visible and nearinfrared (400 nm to1000 nm) wavelength range regions were measured. Artificial neural network (ANN), classified the different wavelength regions on oil palm fruit through pixel-wise processing. The developed ANN model successfully classified oil palm fruits into the three ripeness categories (ripe, underripe, and overripe). The accuracy achieved by our approach was compared against that of the conventional system employing manual classification based on the observations of a human grader. Our classification approach had an accuracy of more than 95% for all three types of oil palm fruits. The research findings will help increase the quality harvesting and grading efficiency of FFBs.


Introduction
Oil palm fruit is typically grown in the tropical areas in South East Asia, South Africa and South America [1]. Palm oil provides vital food for millions of people and has been found to be very healthy for the human diet. Palm oil has good resistance to oxidation and prolonged exposure to high temperatures, thus making it ideal for frying. A high percentage of palm oil is usually added to frying oils because of its structure and other economic advantages. The Malaysian palm oil industry is considered to be highly regulated. A major problem faced by oil palm exporters and producers is the accurate grading of fresh oil palm fruits according to their ripeness levels before processing. The maturity or ripeness of the oil palm fruits dictates the quality as well as overall marketability of the palm oil produced [2].Oil palm fruits can generally be categorized into four ripeness grades: ripe, underripe, unripe, and overripe [3]. Bunches range in color from yellow, reddish orange, red, to black. A fruit before maturity is typically yellow at the base and dark purple to black at the apex. A young palm has 50 to 100 red-violet ripe fruits per bunch. The ratio of oil palm fruit pigments, such as carotenoids and chlorophylls, affects the color of the oil palm fruit. For example, unripe fruits have a higher proportion of chlorophyll that gradually decreases upon maturity [4]. Similarly, carotenoids increase as oil palm fruits mature [2]. Color changes resulting from biochemical reactions can likewise be related to fruit maturity [5].
Among the challenges during oil production is the grading of oil palm fresh fruit bunches (FFBs) in terms of maturity. Workers still employ the conventional method that requires the use of their experience to assess oil palm fruit bunch condition visually by making a small cut in the fruits to see the mesocarp color and counting the number of loosened fruits per bunch [6,7].The manual grading of oil palm FFBs is a time-consuming and labor-intensive process that is prone to biased appraisal and human error, drastically affecting the growers' profitability [4]. Therefore, a rapid, reliable, and accurate grading technique for the detection of oil palm FFB ripeness is necessary.
Successful automation of this process requires a system that can yield results that are comparable with human grading. The application of a color vision camera system to differentiate the classes of palm oil has recently been studied [8].
Researchers have developed a maturity color index based on different color intensities. Moisture measurements are efficient indicators of the internal features and characteristics of fruit and can thus be used in different applications to obtain valuable information on fruit [9].
A method used to design a non-destructive machine for fruit inspection is the optical imaging technique. Fruits and vegetables have previously been categorized based on physical characteristics [10,11]. Optical sensors have more recently been used for fruit quality detection in various horticultural crops [12][13][14][15]. Cameras used for detecting the maturity of the fruit have optical sensors for multi-spectrum imaging. A number of the techniques used include near-infrared (NIR) reflectance spectroscopy, laser photon counting spectroscopy, and image analysis. García-Ramos et al. [14] reviewed several non-destructive techniques used for determining post-harvest fruit firmness. Multispectral imaging techniques collect spectral information in two or three selected spectral bands. Different features of the target are determined based on the band inspected. The success of a multispectral machine vision system depends on the accuracy of spectral band selection from a range of probable inspection spectra. In infrared color composites, the colors associated with those bands in the 0.7 -1.1 μm interval are normally richer in hue and bright from tree leaves. The representative original spectra bands of leaves in the high-, middle-degree water stress and no water stress, respectively is shown in figure 1.

Figure 1.
Original spectra of leaves with different relative water content (RWC).
A visual system is designed to produce high-resolution spectral bands called imaging systems or hyperspectral sensing. Sensor systems that have been investigated for vision-based fruit bunch grading include optical RGB cameras and hyperspectral imaging cameras [2, 5,8,[16][17][18][19].
Hyperspectral imaging has also been widely used (mostly on apples) to measure the internal quality attributes of fruits, such as sugar or SSC, flesh and skin color, firmness, acidity, and starch index [20]. These techniques provide evidence of the potential of using optical sensors for FFB maturity determination. This approach provides more useful details for determining the most important spectral bands that can be used to differentiate normal and abnormal apples [21].
Non-parametric machine-classification algorithms are among the simplest and oldest methods of pattern recognition and are suitable for determining the ripeness of fruits [22].However; most previous studies were conducted under laboratory conditions. In this study, selective visible NIR bands in a portable optical sensor system were used for the determination of oil palm FFB maturity. More information on the quality of agriculture can be provided by the more detailed reflectance data of hyperspectral images compared with RGB [23].To evaluate the internal qualities of oil palm fruit in a laboratory, the use of a non-destructive technique such as hyperspectral imaging enables the testing of a larger number of In such a system, a hyperspectral sensor is used to acquire reflectance data, and artificial neural network (ANN) algorithms are employed to classify three classes of oil palm FFBs (underripe, ripe, and overripe FFBs). ANN is application is ranging from data classification to data prediction and data visualization [24][25][26].
The ANN model used in this work was the multilayer feed forward network with three layers (30 input layers, 15 hidden layers, and one output), as shown in figure  2. A training dataset was used to train the algorithm, whereas testing datasets were used to test the developed (trained) algorithm in terms of predicting the class of the test dataset samples. The classification accuracies were determined by the reflectance data, which were directly used as input to the classification algorithm.
The statistical analysis method is important for determining the difference between the categories in this study, such as the receiver operating characteristic (ROC). ROC analysis offers a more robust evaluation of the relative prediction performance of alternative models compared with traditional comparisons of relative error [27,28]. ROC is considered a statistical measure for studying the performance of an imaging or diagnostic system with respect to its capability to detect abnormality accurately and reliably [true positive (TP) without providing false detection [29]. In other words, ROC analysis provides a systematic analysis of the sensitivity and specificity of a diagnosis [30][31][32]. The true negative (TN) and TP indexes represent agreement with the classification of a human expert classifier. The false negative (FN) and false positive (FP) indexes represent disagreement in classification. At the end of each epoch (e), when all validation patterns are presented to the ANN classifier, the statistical indexes of such epoch are calculated for each threshold (t), including sensibility and specificity [29]. Sensitivity refers to the capacity of the classifier to identify a positive pattern among truly positive patterns, as shown in figure 3. The value of the sensitivity varies between 0 (when FN ≠ 0 and TP = 0) and 1 (when TP ≠ 0 and FN = 0).A smaller number of FNs denotes a higher sensitivity of the test. The sensitivity values are given along the abscissa axis and are given by: Specificity denotes the capability of the classifier to identify negative patterns among the truly negative patterns. The specificity varies between 0 and 1. The (1specificity) values are arranged along the axis and are given by: The ROC curve is a Cartesian graph that represents the dependency of the sensitivity and specificity of a classification system. An ideal classifier has a process sensitivity = 1 (TP = 1) and specificity = 1 (FP = 0) [30]. The ROC curve is an alternative approach to achieve accuracy in the evaluation of learning algorithms on natural datasets. The key assumption of ROC analysis is that TP and FP rates describe the performance of the model independent of class distribution. This analysis is conducted to provide a more robust comparative evaluation of the expected performance based on target data compared with a simple comparison of error, which assumes that the observed class distribution does not reflect any differences in the cost of different types of error. ROC analysis is of value in the evaluation of expected classifier performance under varying class distributions.
ROC curves describe the predictive behavior of a classifier independent of class distributions or error costs, thus enabling the decoupling of classification performance from these factors. ROC analysis is often called the ROC accuracy ratio, a common technique used to determine the accuracy of default probability models. A classifier has a higher probability of ranking a randomly chosen positive instance than a randomly chosen negative instance based on the area under ROC curve (AUC) [33].
AUC is a significant measure of the accuracy of ripeness determination. If AUC is equal to 1, then the ROC curve comprises two straight lines: one line vertical from (0, 0) to (0, 1) and the other line horizontal from (0, 1) to (1,1). This test is 100% accurate because both sensitivity and specificity are 1.0, thus yielding no FPs and FNs. In other words, a test that cannot distinguish between what is normal and abnormal corresponds to an ROC curve that has a diagonal line from (0, 0) to (1,1). The ROC area for this line is 0.6. ROC curve areas are typically between 0.6 and 1.0. Consequently, the value of AUC will always satisfy the following inequalities: 0 ≤ AUC ≤ 1 [42]. An AUC of close to 1 (area of unit square) indicates very reliable diagnostic test.
Based the literature review there are still limitation on use indoor hyperspectral scanner device for oil palm ripeness different. This study was carrying out to develop an automated system for oil palm fruits bunch grading by using hyperspectral scanner technique. In addition, several wavelengths are investigated to distinguish between the three categories of oil palm fruit ripeness. The detailed objectives of this research are as follows: i.
To identify the relevant technologies to ensure the only ripe palm oil bunches can be collected. ii. To design and build an intelligent prototype for a real time grading system using hyperspectral scanner. iii. To test and validate the developed system through actual palm oil bunch collection. From the statistical pattern recognition of view, three band selection methods will apply.
From the statistical pattern recognition of view, three band selection methods will apply. The main aim of the study described in this paper also was to develop a hyperspectral technique that can assist the quality evaluation and classification for oil palm fruit.

Materials and methods
The flow chart of the method logy is given in figure 4. Details for each step are given in the following section. In this paper, total of 469 bunches evaluated by inspectors were allocated, tested, and divided into three types (nigrescens, virescens, oleifera). Each type of oil palm fruit bunches has three categories of ripeness (underripe, overripe, and ripe) were qualitatively determined by a human expert as shown in figure 5.  All samples were freshly taken from the MPOB farm area at Kluang, Johor, Malaysia. All fruits from the same bunch were in a similar status of ripeness despite the fact that their colors and size may vary with location in the bunch.

Hyperspectral device preparation
The hyperspectral active sensor system used for data collection is shown in figure 6. The image acquisition device utilized for this study has a high resolution of 1600 × 1200 pixels and pixel depth of 12 bits/pixel with 824 spectral bands from 400 nm to 1000nm. In this study, only reflectance measurements were analyzed. The hyperspectral imaging system employed in this research enabled different configurations for imaging in the visible NIR range setting. The hyperspectral system is very important to achieve good accuracy. and 50 W) that were tripod mountable are especially designed for indoor laboratory diffuse reflectance measurements over a 400 nm to 2500 nm region were switched on to ensure even light distribution. SpectralDAQ was launched from the laptop. Spectral imaging properties were launched automatically upon selection of desired band. Using the camera controls, the frame and exposure time was adjusted to 15 ms. through monitoring, we ensured that the profile plot peak did saturate nor exceed 4096.The camera height was accordingly adjusted to 1.1m from object height. For the OLE23 objective lens, focus was adjusted using the calibration sheet, such that the image was sharp and not blurred. The x-stage scanner of spectralDAQwas used to control the scan mirror control for the determination of the start and end points along the sample area. Scan rate was adjusted by visually comparing the ratio of actual length to width based on the viewed image. These steps ensured that the spectral camera was ready for the assessment of the oil palm fruit bunch. The oil palm fruit and the white reference were placed on the x-stage platform, and the start and end positions were defined. The hyperspectral imaging system was calibrated both spectrally and spatially by using the following procedures: Spectral calibrations employed eight lamps. For the spatial calibration, a white paper printed with thin parallel lines that were 2 mm apart was placed at the sample holder. The calibration results showed that the system was highly linear and that the distortion of spectral and spatial information was within one pixel on the charge-coupled device detector. Thus, no spectral and spatial corrections were needed for the system. The camera and spectrograph were used to scan the oil palm fruit line by line as the conveyer moved the oil palm fruit through the field of view of the optical system. The oil palm fruits were manually placed on the translational plate, which was covered by rubber mat to prevent the reflection of light, with region of interest facing the camera. After finishing the scans on one entire fruit bunch, the spatial-by-spectral matrices were combined to construct a three dimensional (3D) spatial and spectral data space. The scanning time for one oil palm was dependent on the integration time used for the camera, which was fixed to 200 ms and 5 s, and on the size of the oil palm. The hyperspectral data must be normalized with a standard reference under the same illumination system setup to establish a reflectance coefficient at each pixel location. The normalization process can be expressed as: where I(x,y,z) is the reflectance radiation intensity of spectral band z at pixel location (x,y), Iref (x,y,z) is the radiation intensity of a white reference tile with a known reflectance coefficient at band z and pixel location (x,y) under the same illumination light state, Idark (x,y,z) is the noise of the sensor at band z and pixel location (x,y) under no light reflectance, andRfactor (x,y,z) is the normalized reflectance factor used for every pixel (x,y) at band z. A scaling factor of 10000 was used to increase the dynamic range of the reflectance, as shown in above equation.
The normalization and rearrangement processes comprised the calibration process of real hyperspectral data. After the calibration process, the 3D data cube is considered a 3D image cube. The acquired hyperspectral images were processed using Environment for visualizing Images (ENVI 4.7) to do classification such as subseting image, image resizing, filtering the image to remove the noise from the image, region of interest ROI. About 35000 pixels were manually selected from every corrected image as a region of interest (ROI). The average spectrum from ROI of normal surface of each fruit was calculated by averaging the reflectance spectral value of all pixels in the ROI. The denoised image is used with data processing software such as Matlab® to do the analysis of the FFB classification. 11×11 low pass filter to remove the image noise. Low pass filter maintains the lowfrequency components of the image, which smooth it. Low pass filter have the same weights in every kernel element, also changing the center pixel value with an average of the neighboring values. The practical kernel size is 11×11.

Data analysis
A total of 469 fruits were inspected and distributed into three types of oil palm fruit which have three classes (underripe, ripe, and overripe). These samples were analyzed on two stages, one stage is every type of oil palm fruits individually analyzing to get the reflectance for each class of fruit and which wavelength can distinguish between the three categories, and second stage all the types of oil palm fruit is together at specific wavelength which can selected by the percentage of high reflectance for each type of oil palm fruit. These samples were randomized and separated into independent training and testing datasets (75:25) to evaluate different classification algorithms. The replicates for each sample were averaged prior to further analysis. Matlab® was used for the analysis of the spectral data. This section analysis the most generally used statistical method found in the scientific literature for this purpose: Artificial neural network (ANN).

Spectral reflectance of the Virescens fruit
The typical shapes of ripe, under ripe and overripe oil palm fruit reflectance spectra was present in figure 8. The most important difference is the chlorophyll absorption hole (around 675 nm), which disappears as the fruit ripens. Virescens oil palm fruits had three broad band absorptions in the ripeness around 520, 670, and 970 nm regions. The relative reflectance increases steadily over the wavelengths 700-880 nm. The spectral bands from 750-910 nm differentiate between the three different categories of the fruit, which clearly demarcates the ripeness between three categories of oil palm fruit based on the wavelength.

Spectral reflectance of the Oleifera fruit
The typical shapes of ripe, under ripe and overripe oil palm fruit reflectance spectra was present in figure 9. The most important difference is the chlorophyll absorption hole (around 675 nm), which disappears as the fruit ripens. Oleifera oil palm fruits had three broad band absorptions in the ripeness around 520, 670, and 970 nm regions. The relative reflectance increases steadily over the wavelengths 680-900 nm. The spectral bands from 750-910 nm differentiate between the three different categories of the fruit, which clearly demarcates the ripeness between three categories of oil palm fruit based on the wavelength. The spectral bands from 710-940 nm showed the best test accuracy between the three different types of the oil palm fruit as shown in Table 2. Virescens type has stable range from 770 nm to 870 nm 83.3 %, also nigrescens type has same accuracy 83.3 % in the range 830 nm to 890 nm, and there is overlap between from 830 nm to 870 nm, 830 nm it consider the best wavelengths to differentiate between three categories of oil palm fruits. Compared with oleifera type, it has accuracy medium at special region and the accuracy is limited, so there is no overlap between oleifera and another two types of oil palm fruit, 880 nm is consider the best wavelength for oleifera type.  Table 2. Overall classification accuracy for three types of oil palm fruits by using (CHAID growing method).

Figure 11.
Reflectance values for all data samples (nigrescens, virescens, and oleifera) at best wavelength 830 nm.

ANN-MLP evaluation using receiver operating characteristic
The real performances of the ANN-MLP network is evaluated by the receiver operating characteristic (ROC) analysis and area under curve (AUC). The ROC analysis is related in a direct and natural way to the cost/benefit analysis of decision making.

Optimal neural network classifier nesults.
During the training stage, a total of 469 samples of each dataset (nigrescens, virescens, oleifera), the dataset were presented as the full set of input samples to the ANN-MLP. Each dataset is graded as under-ripe (-1), ripe (0) and over-ripe (1), the maximum sum squared error (SSE) was empirically set at 10e-4, and the process was carried out at 10e6 epochs. classify the ripeness of the three types of oil palm three individually as shown in figure 12. As well as is the case all data together.

Figure 12. ANN-MLP model for ripeness classification for (a) Nigresence (b) Virecsens (c) Oleifera
To evaluate the ANN model for ripeness classification that distinguishes between the under-ripe, ripe and over-ripe grades for three types of oil palm fruits, the same sets are examined. The ripeness classification and evaluation performed by the score of the AUC is measured from the ROC curve as shown in figure 13.   Figure 13 (a, b, c) depict the ROC graphs of c for nigrescens, virescens andoleifera performed by the ANN-MLP with the higher AUC score obtained to recognize grade in the three datasets. The obtained results in figure 13 (a) indicate that the selected band features with ANN-MLP classifier performs perfect classification with AUC test accuracy of nigrescens ripeness (94.54). In addition, the ANN-MLP classifier could perform at a higher rate with AUC test accuracy of virescens ripeness (98.67) as shown in figure 13 (b). Meanwhile, figure 12 (c) shows the ROC graphs of oleifera ripeness classification performed by the ANN-MLP to recognize selected band features with AUC score (97.89). Figure 14 shows the ROC graphs all data ripeness classification performed by the ANN-MLP to recognize selected band features with AUC score (95.73). Overall, the AUC score is an indicator of classification rate performance, which is high when the AUC test accuracy is higher than 94 in all cases.
The ripeness classification performed by the ANN-MLP decision system to recognize the selected bands features using all datasets showed great performance results.

Conclusions
This paper proposed a framework for (oil palm fruits bunch grading by using hyperspectral scanner technique). The ripeness detection of oil palm fruit bunch suffers from the lack of productivity, efficiency problems, and still graded by human visualization. Research studies have shown that the major cause of problems can be traced back to the lack of automating inspection system. One commonly cited means to overcome this problem is through advanced technologies. However, the challenge arises when a decision has to be made to choose the best technology to fulfill the present needs, since each system has its own technical, economic, and risk considerations. The goal of this paper was to develop a prototype that can help palm oil companies to choose the best investigation system that fits their needs.
After enhancing the images, the resulted wavelength and reflectance was taken to be processed for classification. After analyzing the region of spectral bands between 750 to 900 nm, the best wavelength to distinguish all data of oil palm fruit (nigrescens, virescens) was 830 nm as show in figure (10), also has same reflectance values in three categories of nigrescens and virescens but oleifera type difficult to differentiate between two categories (underripe and ripe), that is meaning that nigrescens and virescens has almost same characteristic and the properties but oleifera does not has same characteristic with nigrescens and virescens. The classification was performed by ANN. The developed system shows high classification results on accuracy of the ripeness detection for the three types of oil palm fruits separately (nigrescens, virescens and oleifera ) with rates of 94.54 %, 98.67 % and 97.89 % respectively using the ANN-MLP classifier, A high classification results on accuracy of the ripeness detection for all data (nigrescens, virescens and oleifera) was 98.67 %.
The solutions discussed here are oil palm detection system. By using a set of carefully selected hardware that used in indoor, the degree of ripeness can be detected in both with high accuracy. This developed system can be used to enhance and optimize the entire grading process. The proposed prototype is not only practical for detecting the ripeness of the bunches in terms of design and development, but also very affordable since the components of the proposed system are not very expensive. Therefore, further research in the area of hyperspectral has a great potential and can provide technological advancement for ripeness of oil palm fruits.
In summary, the application framework improves the harvesting operation and presents the potential for improved workflow reliability and grading performance and for effortless derivation of performance indicators in project management. This study will be useful to the oil palm industry, oil palm engineers, oil palm harvesters, graders, mill operators, plantation managers, small holders and to the research community.