Development of Harumanis Mango Insidious Fruit Rot (IFR) Detection by Utilising Vibration-Based Sensors and PCA with Random Forest

Utilising single or multiple modalities systems, non-destructive techniques have been used to assess and determine the quality of mango (magnifera indica L.). It is challenging to anticipate and varies by cultivar at what harvest maturity stage will result in the optimum postharvest quality. Insidious Fruit Rot (IFR) is a disease that affects mangoes. When infected with Insidious Fruit Rot (IFR), the mango variety Harumanis does not exhibit exterior mutilation at the time of harvest or during the mature stage. However, a lack of density in the sinus area can occasionally be detected. Traditional ways of locating the diseases or pests living in the mango are useless for the commercialization of the product. This research presents the investigation done on IFR infection detection using piezoelectric vibration sensors and electret microphones. Data derived by the sensors were processed using the PCA and Random Forest methods to determine the non-IFR and the mango afflicted with IFR. The proposed approach achieved correct classification and is expected to be useful for planters in detecting IFR correctly before Harumanis mangoes were marketed.


Introduction
Currently, mangoes are mostly produced in India.[1] Most of the mango tree's parts, including the leaves, bark, gum, sap, and seeds, are medicinal.Because mango leaves are anti-microbial and emit a lot of oxygen, they help to maintain cleanliness and good health.[2] Mangoes are beneficial to the digestive system and are a rich source of vitamins, fibre, and antioxidants.Additionally, mangoes have a delightfully sweet flavour and a lovely scent.[3][4][5] Fruit diseases that affect the growth and production of high-quality fruit include Anthracnose (Colletotrichum Gleosporide), Mango Sooty Mould (Meliola Mangiferae Eark), Pink Disease (Cortitium Salmonicolor), Stem End Rot (Botryodiplodia Theobromae), and Insidious Fruit Rot (IFR), also known as "Soft Nose" or "Spongy Tissue".[6][7][8][9] When infected by IFR, a mango variety popular in Malaysia called Harumanis does not exhibit exterior mutilation at the time of harvest or the mature stage.[10] However, occasionally a lack of density in the sinus area might be detected.[7,11] Farmers have traditionally used several techniques to assess the quality of Harumanis, including the floating technique, water displacement, and a manual acoustic method by flicking fingers to assess the lack of stiffness at the sinus region of the mangoes.Using the density idea, these techniques were used to predict the interior characteristics and quality of the fruit.[11,12] The reliability and usefulness of those techniques are, however, a source of much dispute.There are significant issues with the methods' dependability and efficacy.Quick, convenient, and inexpensive techniques must be developed to meet consumer demand for fresh and healthy fruits.[13] Featured below is the Harumanis mango farm in the Institute of Sustainable Agrotechnology (INSAT), UniMAP in Sungai Chuchuh, Padang Besar, Perlis, Malaysia where a lot of Harumanis mango trees were planted for experimenting purposes.This study describes the methodologies and procedures of a non-invasive grading system for IFR detection utilising an electret microphone and a piezoelectric vibration sensor with Machine Learning (ML) for supervised and unsupervised data processing.The objective is to simplify the mango IFR grading system and help cultivators manage the quality of their produce.

System Development
The project was divided into hardware and software development to prepare for the experiment.This was done to ensure the two aspects were ready before utilizing them for the experiment.

Hardware Development
For hardware assembly, the sensors chosen were the electret microphone and piezoelectric vibration sensor.The microcontroller board is the Arduino Uno.A 5V push-pull solenoid was used to create The experiment consists of 2 inputs and 1 microcontroller.The 2 inputs are from 2 units of electret microphone and 1 piezoelectric vibration microcontroller.The output was observed using a computer.2 electret microphones were positioned by the nose of the mango, while the piezoelectric vibration sensor was positioned under the mango and supporting its weight.The 5V push-pull solenoid was positioned directly in front of the apex region of the mango.Figure 3 shows the schematic diagram of the complete setup for this experiment.

Software Development
For software development, MATLAB software was used to produce the Principal Component Analysis (PCA), an unsupervised dimensionality reduction technique, output.The 3D plot constructed was examined to choose the best outcome combination for further analysis.WEKA software was used as a supervised ML tool to further assess the data obtained from the experiments using Random Forest (RF), an ensemble of decision trees, where each tree is built independently and offers a prediction, and the outcome is determined by combining the predictions of all the individual trees.
Setting up the software before conducting experiments is essential for data analysis, experiment control, simulation, statistical analysis, data visualization, experiment design, data security, code management, collaboration, and reproducibility.It ensures that the data collected is properly processed and analysed, leading to meaningful and reliable results that contribute to scientific knowledge.

Methodology
Harumanis mango samples were obtained from Kampung Banggol Sena, Mata Air, Perlis, and Malaysia cultivars.The mangoes were first inspected by human experts to determine their state for initial classification and were then brought to the laboratory for further investigation.For the experiment, each mango was weighed and measured for documentation purposes.
The flow and the code for the data collection are shown in figure 4 below, starting with the push button and the microcontroller gathered pre-data for comparison purposes from the sensors.This is essential to determine whether the experiment is making an impact on the sensor output.Then, the pushpull solenoid was energized and de-energized to create pulsation for the post-data collection.The data was then displayed in the serial monitor in the computer.The collected data was then processed using both MATLAB software for PCA protocol and WEKA for RF.The n was set to a total of 100 counts, and it was repeated for 15 cycles each.The data collected was then used in the MATLAB software to produce a 3D model that simplified the raw data for better inspection.The PCA method is the Statistical Process Monitoring (SPM) technique that is most frequently employed, according to a survey article.[14] It should be emphasised that the original PCA has two significant flaws, namely the absence of loading vectors in sparsity and the fact that each main element is a linear combination of all features, which may not fully reflect all plausible relationships between the features.[15] Meanwhile, Random Forest (RF) were created based on Decision Trees algorithm.For classification and regression issues, ensemble learning techniques called random forests are used.A RF is an ensemble of various decision trees that built using bootstrap samples.RF average several individual trees, which lowers the high variance and gives us a strong classifier while still building each tree with a large variance.No parameter adjustment is necessary for RF.They differ from many of the current ML models in use.In order to build tree-based models, such as RF, it is necessary to take samples from the dataset, select fewer attributes, and identify the value that best splits the dataset.[16] Different decision trees, such as the ones in figure 6, are combined to form an RF classifier.The wisdom of crowds, a straightforward yet potent theory, is the foundation of RF, and it states that when many unrelated individuals participate as a committee to make a forecast, the outcome is more likely to be accurate than if only one person made it.Due to the trees' mutual defence of one another against one other's distinct defects, this stunning appearance is produced.[17]

Figure 5. The difference between Artificial Intelligence (AI), Machine Learning (ML), Deep Learning (DL), an onion diagram representation. [18]
Bagging and feature randomness are the foundations of the RF classifier.Building a decision tree using replacements and random samples from the dataset is referred to as bagging.Each decision tree is generated using a sample set of size N from a training set of size N, composed of a random sample from the original data and replacements while each node randomly selects a subset of the Y input characteristics for the X input features.Each tree is developed to its full potential without being pruned.Since each tree is created from a random set of features and its outcome is unrelated to those of the other trees, the fundamental benefit of the random forest classifier is that overfitting of the model is avoided.[19] Figure 6.Example of Random Forest decision tree branch.The figure illustrates the relationship between each decision tree.

Results and Discussion
The IFR was not easily discerned by the eyes for Harumanis mango and can only be assumed by using traditional methods Therefore, it is not the most reliable method for the classification of a Harumanis mango.Using sensors to determine its characteristics is an alternative method to determine the state of the fruit.Both PCA and RF methods were used to measure and determine the prediction for the success of this system.PCA was used due to its ability to reduce the complexity and noise of the data and highlight the most important features and relationships.[20] While WEKA provides the flexibility to access its functionalities programmatically, enabling automation and integration into larger systems.[21] The application of both PCA and RF methods created a better and more stable output for the data obtained through this study.Figure 7 is the outcome from using the PCA algorithm in MATLAB and figure 8 is the scatter plot determined using WEKA.
The plot in figure 7 makes it evident that the IFR-affected Harumanis mango has a distinctive plot, although the suspect-IFR and non-IFR mangoes have some traits in common, thus exhibiting the value of 88.09%.All the samples were classified initially by human experts and a destructive method was applied to the samples post investigation to determine its internal properties.It was found that the samples have been sorted correctly and in accordance with its initial examination by human experts.
The corelation of determination (R 2 ), which indicates the possibility that data values will reoccur in subsequent measurements, is a crucial number for constructing a regression line.Additionally, it shows how strongly variables correlate with one another, which can be significant in a variety of data processing.The best value of R 2 found in this study is 0.9295 compared to the corelation of coefficient (R) which is 0.9641, both values gathered from the RF method.While PCA exhibits the value of R as 0.8140 and R 2 as 0.6625.
From the results measured in table 1 below, it was determined that the RF results are more feasible to prove that the sensors and method used was working in excellent condition.It is plausible since it can train the system to make better classification while taking into consideration all angles of inputs.Meanwhile, PCA explicitly reduces the dimensionality of the data by selecting a subset of principal components.

Conclusion
Experiments performed using an electret microphone and piezoelectric vibration sensor with PCA and RF were completed.However, there are other sensors to be utilised to improve the outcome of this study in the future.The ultrasonic pulser will be added to the project to obtain knowledge about the inner composition and characteristics of the Harumanis mango while focusing on non-invasive method.The ultrasonic waves created can pass through material by converting electrical energy into mechanical vibrations in a piezoelectric device, thus also maintaining the use of vibration sensor.Deep Learning (DL) methods of processing will be used to further enhance the outcome of the study as the current methods was used as an early visualisation into the subject.
However, the current approach proposed was able to determine the IFR-afflicted Harumanis mango from the non-IFR and the mildly afflicted.The findings demonstrate that the sensors used can be utilised to detect IFR in mangoes.It was also shown that the data fusion of an electret microphone and a piezoelectric vibration sensor, and processed using PCA and RF methods can be a viable substitute for a human expert in the non-destructive evaluation of IFR in mangoes.

Acknowledgement
This study was funded by the Ministry of Higher Education Malaysia under the Federal Government Scholarship Hadiah Latihan Persekutuan (HLP) of the Department of Polytechnic Education and Community College (PolyCC) also supported by the Faculty of Electrical Engineering Technology (FTKE), University Malaysia Perlis (UniMAP).

Figure 2 .
Figure 2. Block diagram of the experiment.There are 2 inputs, electric microphone and piezoelectric vibration.

Figure 3 .
Figure 3. Schematic Diagram of the experiment.The figure depicts the Harumanis mango experiment from the top view.

Figure 4 .
Figure 4. Flowchart of the program.

Figure 7 .
Figure 7. 3D model using PCA method.The relationship between PC 1 and PC 3 was observed.

Figure 8 .
Figure 8. Scatter plot using Random Forest method.Scatter plot acquired after processing data of PC 1, PC 2 and PC 3.

Table 1 .
System performance result