Improving the quality of coffee yield forecasting in Dak Lak Province, Vietnam, through the utilization of remote sensing data

This research aimed to identify sensitive areas for Robusta coffee trees in Dak Lak province, Vietnam, where frequent droughts caused fluctuations in productivity. To improve yield forecasting, a mask was developed to extract potential predictive variables from satellite-derived vegetation indices (VIs). Correlation coefficients between VIs and coffee yield were analyzed to determine sensitive areas, and grid cells with high multiple correlation coefficients and a variable over time were used to build the mask for extracting VIs as predictor variables. The study found that sensitive areas had more challenging farming conditions than long-term crops, and the Vegetation Health Index was the most appropriate index for predicting coffee yield. The forecast quality for 6-8 months in advance was relatively high, with a ‘Willmott’s index of agreement’ ranging from 0.85 to 0.97 and the Mean Absolute Percentage Error ranging from 4.9% to 7.5%. Compared to previous research, the forecast quality has significantly improved. This study provides valuable insights for predicting coffee yield in Dak Lak and highlights the importance of considering sensitive areas and VIs for accurate forecasting.


Introduction
Robusta coffee is a plant of great economic value, bringing significant foreign currency to many coffeeproducing countries.It is an essential source of foreign exchange for these countries (Dinh et al 2022).Timely and accurate prediction of Robusta coffee yield is crucial for the profitability of the global coffee industry (Kouadio et al 2021).Dak Lak, the largest coffee-growing area in Vietnam, covers an area of 13,030 km 2 and has a population of 1.919 million people with a density of 147 people per km 2 (figure 1).The region is renowned for its coffee-growing area, which spans over 2,100 square kilometers and accounts for 16.1% of its total area (Statistical Yearbook 2021).
Dak Lak accounts for 31% of Vietnam's total coffee production.Coffee exports play a significant role in the province's economy, representing 86% of its total agricultural exports and over 60% of its annual income (Huong and Anh 2019).Robusta coffee is the primary coffee variety grown in Dak Lak and comprises 12% of the world's total Robusta coffee area (Huong andAnh 2019, Dinh et al 2022).The province experiences a dry season that lasts from December to the end of April, often resulting in drought (Phan et al 2009, Nguyen et al 2014).Given the fluctuations in coffee yield due to droughts and hot weather, there is a need for early warnings to ensure the profitability of the coffee industry in Dak Lak.
Many studies have shown that coffee is highly sensitive to changes in temperature and precipitation levels (Craparo et al 2015, Kath et al 2020, Venancio et al 2020, Byrareddy et al 2021, Dinh et al 2022).According to Venancio et al 2020, droughts and high temperatures have a negative impact on coffee production, with rising temperatures having a more significant effect than decreases in annual rainfall.This study also found that coffee yield is affected by the previous growing seasons.Byrareddy et al (2021) reported that drought reduced Robusta yields by an average of 6.5% across all the Central Highlands of Vietnam, resulting in a significant decrease in gross margins, with an average reduction of 22% compared to years with moderate rainfall.The impact of rising temperatures on Robusta coffee in Dak Lak from January to October would be a decrease in yield, which would be inversely proportional to the amount of rain (Dinh et al 2022).
The Plant Indexes are common in analyzing drought, plant health, and predicting crop yield.In 1969, Kriegler and colleagues (Kriegler et al 1969) introduced the Normalized Difference Vegetation Index (NDVI), calculated by subtracting the red radiation from the near-infrared radiation and dividing the result by their sum.This index is frequently used to monitor plant growth (as cited in Bannari et al 1995, Huang et al 2021).NDVI has a strong correlation with vegetation properties such as fractional cover, leaf area index (Tian et al 2017), photosynthetic radiation absorption capacity, biomass (Los 1998, Zhu andLiu 2015), chlorophyll concentrations in leaves (Pastor-Guzman et al 2015), plant stress and plant yield (Vicente-Serrano et al 2016).NDVI is also utilized to examine the connection between vegetation conditions and major climate indices, including ENSO indices (Shuai et al 2016).The Vegetation Condition Index (VCI) is a time-normalized version of NDVI data and is often employed in assessing plant health and stress, including drought (Baniya et al 2019, Moussa Kourouma et al 2021).
Several studies have shown that vegetation growth is significantly impacted by temperature, precipitation, sunshine duration, and solar radiation (Nemani et al 2003, Chen et al 2014).Climate change has significantly impacted vegetation, contributing to one-third of the changes in crop yields each year (Ray et al 2015).The number of hours of sunshine, rainfall, and temperature changes affect plant growth and are reflected in the NDVI (Hulme 2001, Nicholson 2005, Wang et al 2007, Chu et al 2019).NDVI is highly sensitive to factors such as rainfall and sunshine hours.Regarding precipitation, NDVI has a strong response to low precipitation but is less sensitive to high rainfall (Nicholson 2005).The number of sunny hours is closely linked to NDVI, and when the rain is high, the number of sunny hours decreases, leading to an inverse relationship between NDVI and rainfall (Chu et al 2019).
Some plant indices use both red radiation and near-infrared radiation, including the Ratio of Vegetation Index (RVI) and Landsat Enhanced Vegetation Index (EVI).RVI is a simple index that has good predictive capabilities for leaf water content and biomass (Wei et al 2017).However, it has lower accuracy in reflecting vegetation conditions and drought than NDVI (Towers et al 2019).EVI is similar to NDVI in formula but is adjusted for atmospheric and canopy noise conditions, making it more sensitive in areas with dense vegetation.A disadvantage of EVI is that it can only be used with Landsat satellite imagery.
The Temperature Vegetation Dryness Index (TVDI) was developed by Sandholt et al (2002) as a tool to assess the level of drought stress in vegetation.The TVDI is designed to estimate the water content of the topsoil under different vegetation cover conditions and provides an evaluation of the soil's moisture level.The TVDI is a valuable tool in agriculture, particularly in arid and semi-arid regions, providing a means to monitor the effects of drought on vegetation growth and productivity.
Land Surface Temperature (LST) is a widely used indicator for assessing land cover conditions.However, due to variations in the statistical characteristics of LST across different months, the Temperature Condition Index (TCI) is often used instead.TCI is related to soil evaporation and moisture, and a high TCI value indicates healthy vegetation cover (Kogan, 2002).Some studies have shown that TCI is more accurate than VCI for assessing land cover conditions, while others have shown the opposite (Measho et al 2019, NourEldeen et al 2020).The Vegetation Health Index (VHI) is calculated using both VCI and TCI and is widely used for assessing vegetation growth and drought conditions, as well as predicting crop yields (Kogan et al 2012, Kloos et al 2021).In fact, Kogan et al (2012) successfully used VHI to predict yields of certain cereal crops.Several approaches have been employed for coffee yield forecasting, including Machine Learning and process-based models (Rodríguez et al 2011, Rahn et al 2018, Vezy et al 2020, Kouadio et al 2021, Dinh et al 2022).With the Machine Learning method, climate data (Dinh et al 2022), soil nutrient composition, and the plant indexes in the growing area (Kouadio et al 2018) were often used as predictors of coffee yield.Various Machine Learning algorithms, such as artificial neural networks and regressions, have been utilized to develop yield forecasting models (Pham et al 2022).To effectively predict coffee yield, it is essential to consider the complex interplay between various environmental factors and geographical variables.This includes vegetation type, soil type, land use, and geographical location.As demonstrated by Usman et al (2023) and Ray et al (2015), how vegetation responds to climate variability can vary greatly depending on these factors.Furthermore, as shown by Dinh et al (2022), the sensitivity of yield anomalies to weather can differ significantly between districts.This highlights the importance of considering sensitive areas when extracting yield predictors.
The present study aims to predict coffee yield through linear regression, utilizing the plant indexes as predictor variables.These plant indexes will be extracted from areas known to be particularly sensitive to climate variability to maximize the model's predictive power.

2. Methods
The main content of this study was to analyze the correlation between VCI, TCI, TVDI, and VHI with local coffee yields in Dak Lak province to identify sensitive areas for coffee yield, thereby creating a mask to extract predictive variables and forecast coffee yield.For convenience in presentation, we refer to them collectively as remote sensing-based indices (VIs).Since the trends of coffee yield and VIs contributed to the magnitude of the correlation coefficient between them, this effect should be considered in determining the sensitive regions for coffee yield.In addition to comparison, this study also predicted coffee yield using the predictor variables extracted from the APPA.The key steps are depicted in figure 2, with the following implementation details for each step.
1. Identifying the Coffee Yield and VIs trend in Dak Lak province.
2. Investigating the relationship between Coffee Yield and VIs while considering their trend.
3. Determining the optimal sensitive area and VIs for early warning of Coffee Yield.
4. Evaluating the effectiveness of the chosen sensitive area and VIs for early warning capability.

Materials
To make the study more robust, it was essential to ensure that the data used in the analysis were reliable and high-quality.Using satellite data, such as NDVI and LST, provided a large amount of data that could be used to investigate the relationship between climate variables and coffee yield.Using various MODIS products ensured that the data were of high quality, with the technique of maximum value composite being used to eliminate the effects of clouds and other atmospheric conditions.
We utilized satellite-derived vegetation indices (NDVI), land surface temperature (LST), and coffee yield data from 2000 to 2020 as the primary dataset for the study area.NDVI and LST are valuable remote sensing data enabling us to analyze vegetation and temperature patterns within the study area.NDVI, a widely used vegetation index, provides insights into the abundance of green vegetation in a specific region, while LST measures land surface temperature.Combining these remote sensing data with coffee yield information, we aimed to explore the relationship between coffee production and critical environmental factors such as temperature and vegetation coverage.For instance, integrating NDVI and LST data can effectively identify areas with optimal growing conditions for coffee cultivation and areas that might benefit from improvement to enhance coffee yield.
Regarding the data sources and specifications, the NDVI data were sourced from the MOD13A1 VI package, a MODIS product with a spatial resolution of 463.3 m x 463.3 m and a temporal resolution of 16 days.To ensure accuracy, we processed the data using the maximum value composite technique to mitigate the impact of clouds and atmospheric conditions.The LST data were also obtained from the MOD11A2 package with a spatial resolution of 926.6 m x 926.6 m and a temporal resolution of 8 days.Both the NDVI and LST data were obtained from the USGS Earth Explorer portal (https://earthexplorer.usgs.gov/).These datasets have undergone quality assessment, and grid cells with low data quality have been left as missing data.
The coffee yield data used in the study were collected according to the district boundary of Dak Lak in 2000.At that time, Dak Lak comprised 13 districts, with Ea H'leo, Krong Buk, Krong Nang, Cu M 'gar, and Buon Ma Thuot being the central coffee-growing districts that accounted for approximately 62% of the total coffee production.The coffee yield was selected from these five districts and was averaged for Dak Lak.The data on coffee yield were obtained from the Dak Lak Provincial Statistics Office.This data is in the form of averages for administrative units.
The land use map of Dak Lak was obtained from the Department of Agriculture and Rural Development of Dak Lak Province for the years 2000 and 2015.These maps indicated the presence of coffee trees in APPA without segregation.The data from 2015 showed that the coffee-growing area accounted for roughly half of the APPA.The APPA was determined by intersecting these two maps and was used as a source of predictive variables, as depicted in figure 1.

Calculating VIs
To facilitate the analysis and prediction of coffee yield, remote sensing data needs to have a common spatial and temporal resolution.In terms of time, monthly VIs were selected for calculation.The monthly NDVI data was obtained from the input 16-day resolution NDVI data by applying the Maximum Value Composite (MVC) method, which reduced the impact of atmospheric conditions (Holben, 1986, Li et al 2016, Li et al 2017, Chu et al 2019).The monthly average LST, calculated from the 8-day resolution Mod11A2 package, resampling of LST was performed using the Bilinear method to match the spatial resolution of NDVI.
VCI was the normalized value of NDVI over time (Kogan 1995), according to the formula below: Where NDVI i is the NDVI value of a particular pixel in a specific year at time i, while NDVI max and NDVI min are the maximum and minimum NDVI values over a period of analysis, respectively.The numerator represents the difference between the actual and minimum values of NDVI and reflects the state of plant growth and meteorological conditions.The denominator's maximum and minimum values indicate the best and worst conditions of change and partly reflect the local vegetation's condition.Hence, VCI encompasses both historical and real-time information about NDVI.VCI ranges from 0 to 100, with lower values indicating underdeveloped plants and higher levels of drought.In this study, VCI was calculated for each month from 2000 to 2020.TCI was created by Kogan (1995), representing the normalized value of LST over time.The formula used to calculate TCI is: Where LST i is the LST value at a pixel at the time i, and LST max and LST min are the maximum and minimum LST values for a period at the analyzed pixel, respectively.TCI is utilized to evaluate the effects of temperature on plant health.The TCI ranges from 0 to 100, with the highest values indicating the most significant plant stress when LST is at its lowest.According to the formula, the TCI was calculated monthly from 2000 to 2020.
VHI was developed to assess plant health based on a combination of VCI and TCI (Kogan, 2001) and is calculated as follows: VHI is an essential tool in the field of remote sensing and agriculture.The VHI combines the information from the NDVI and the LST, with the weight 'α' typically set to 0.5, to evaluate the health of vegetation.A low VHI value, less than 40, indicates plant stress, whereas a high value, greater than 60, signifies that the vegetation is in good condition.This indicator helps assess the contribution of temperature and vegetation growth to overall plant health.TVDI is based on the relationship between temperature and vegetation dryness and is used to monitor changes in soil moisture and vegetation health over time.
Figure 3 above displays the relationship between NDVI and LST, where the dry upper edge represents pixels with maximum water-stressed conditions for a range of the NDVI, and the lower wet edge represents a thoroughly wet area or a place with unlimited water.LST max is at the top of the Y-axis (associated with the dry edge), while LST min is near the origin of the X-and Y-axes, associated with the Wet Edge.This study considers the dry and wet lines as linear trends, as described in the works of Price (1990), Carlson et al (1994), Moran et al (1994), andRyu et al (2021).
These linear trends are expressed as follows: ( )

= --
Where LST obs refers to the observed Land Surface Temperature from remote sensing.

Determining the trend of VIs and coffee yield
This task not only aims to determine the trend of coffee yield in different locations and VHI in grid cells but is also used to analyze the magnitude of the correlation coefficient between them.The Sen's slope method and the Mann-Kendall non-parametric test were used to identify any significant trends.The Mann-Kendall nonparametric test is used to determine the trend of a time series of data.Given a time series (x 1 , x 2 , K, x n ) with x i representing the data at the time i, the statistic of the Mann-Kendall test (S) is defined as: ( Where Var(S) is the variance of S and is calculated as follows: Where g is the number of tied groups, and t p represents the number of observations in the pth group.Z follows a normal distribution and is in the set N (0, 1).
In the Mann-Kendall test, the null hypothesis (H 0 ) is rejected, indicating the presence of a trend in the dataset when the calculated Z value is greater than the critical value (Z crit ).The critical value (Z crit ) is obtained from the standard normal distribution table.Conversely, if the calculated Z value is less than the critical value, it means that there is no significant trend in the dataset.
In Sen's slope method, the magnitude of the trend (β) is determined as the median of the series of n(n−1)/2 elements {(x j -x k )/(j-k), where k = 1, 2, K, n−1, j > k}.By such a definition, β has the same sign as Z.

Determining the most suitable sensitive area and VIs for early warning of coffee yield
The aim of this step was to determine the remote sensing indices and grid cells (pixels) that were sensitive to coffee yield, creating a mask to extract VIs as predictors.The trend of coffee yield and VIs in the grid cells was related to the correlation coefficient between them.The higher the correlation coefficient, the more sensitive the grid cell was to fluctuations in coffee yield.It was essential to identify the grid cells that were most sensitive to changes in coffee yield.
Given that Y represents the coffee yield, I represents the value of certain VIs at a grid cell, and t is the time variable, the multiple correlation coefficient is calculated as follows: Where R YIt represents the multiple correlation coefficient between Y and I and t, R YI , R Yt , and R It are the correlation coefficients between Y and I, Y and t, and I and t, respectively.Based on the values of R YIt , the sensitive area was defined to encompass the largest possible area to ensure stability, and the average VIs over this area had the strongest relationship with coffee yield.

Early forecast of the coffee yield
The coffee yield was forecasted using the stepwise multiple linear regression technique and validated through the Leave-one-out cross-validation (LOOCV) method to verify the assumptions.The process included adding or removing potential explanatory variables through statistical significance testing after each iteration.The equation for calculating coffee yield is as follows: ( ) Ys a bt c X 11 Where t represents the time variable, and X i is the predictor obtained from the remote sensing product VIs.Ys represents the forecasted coffee yield, a, b, and c i are the coefficients deduced from the multiple linear regression of the training dataset, and b t considers the trend of Y and X i .The value of the time variable, t, was determined by the ordinal number of the year of analysis, with data ranging from 2000 to 2020, and has a value from 1 to 21.The values of X i were extracted from the VIs in two ways: (1) from the mask where the VIs had the best relationship with coffee yield and (2) from APPA and local boundaries (as shown in figure 1).In the first method, because the study area was not large, only one mask was used for all districts.With the second method, each locality would have its mask.The X i values were the average VIs in the sensitive area for one to several months before the coffee harvest time.These X i values were used separately to compare the predictive capacity of each technique.
To the high-reliability model, equation (11) was not built separately for each locality.The coffee yield of each local area would be converted to a standard variable (with a mean of 0 and a standard deviation of 1) and then combined into a more extended data series.With 21 years of data, five districts, and one province, the length of the data series included in the analysis will be 21 × (5+1) = 126.
The accuracy of the coffee yield forecast was evaluated using the LOOCV technique.The predicted series had the same length as the observed series, with each iteration discarding one year of data for 120 observations in the training and testing process.This method improved the reliability of the evaluation results compared to using a single test set.
To validate the accuracy of the coffee yield prediction and identify the need to add or remove variables, four statistical parameters were used, including the correlation coefficient (R), the Root Mean Square Error of Prediction (RMSE), the Mean Absolute Percentage Error (MAPE), and Willmott's index of agreement (d).These parameters are defined as follows: In the equations, n is the length of the series, and Ȳ is the average of Y.These values can be in the normalized form or have been converted to standard form.

The trends of coffee yield and VIs
Based on Sen's slope method and the Mann-Kendall non-parametric test, the results of the coffee yield prediction for Dak Lak province are presented in table 1.The results of the Mann-Kendall test showed that, except for Buon Ma Thuot city, the coffee yield in Dak Lak province tended to increase clearly with a confidence level of 90% or greater.The average increase was 0.029 tons/year/ha, meaning that from 2000 to 2020, coffee productivity had increased by 0.58 tons per hectare.
From the VCI, TCI, VHI, and TVDI data, the trends in VIs were calculated for four consecutive months, as shown in figure 4. The colored areas in figure 4 have a significance level of 0.01.This figure illustrates that these indices exhibit a clear trend from January to April.During these months, the VCI, TCI, and VHI tend to increase by 2% per year, accounting for about 10% of the area.A comparison with figure 1 reveals that the areas with a clear trend in the grid cells are located in perennial crop areas.This area previously suffered from a lack of water for cultivation during the dry season.Still, the situation has improved with the development of reservoirs and irrigation systems, leading to significant changes in the VIs.The increase in vegetation cover is not as pronounced from May to December, which may be due to the relatively high rainfall in the region from May to November.Some areas may also experience a decrease in vegetation cover throughout the year, which could be related to changes in land use.
Table 2 presents the statistical results for the area, indicating the areas where the VIs tend to increase or decrease with a significance level of 0.01.From January to April, the VCI, TCI, and VHI show a substantial increase of 30% or more.From September to December, there was a significant rise in TVDI, accounting for 37.6% of the area.The remaining portion of the table displays fluctuations in the area, with increases or usually decreases of less than 20%.VIs exhibit opposite trends of increase and decrease in many places, which is mainly due to human impacts.According to Hiep et al (2023), areas invested in seed varieties, irrigation, and fertilizers show an increase in VCI and TCI, whereas those without investment exhibit a decrease in VCI and TCI.As noted by Dinh et al (2022), the improvement in coffee yield was the result of human impacts.In the past decade, coffee farmers have made substantial advancements in their agricultural practices compared to the previous ten years.
The significant increases and decreases in VIs were partially attributed to human impacts such as changes in land use, farming methods, and increased investments in improved plant varieties, which have affected vegetation cover.This could result in a high correlation coefficient between coffee yield and VIs in many areas, but the correlation is not related to climate variability.Therefore, it is important to take into account the impact of these trends when identifying areas with high coffee yields.

The correlation coefficient between coffee yield and VIs
Our study constructed maps of the spatial distribution of the correlation coefficient between coffee yield and VIs.The average yield for the entire province was used as the coffee yield in the analysis.The results were used to assess the impact of the trends in VIs and the coffee yield on the correlation coefficient between them.
The correlation coefficient maps were generated using the average coffee yield data from Dak Lak and VIs collected before the coffee harvest time.The coffee harvest takes place in October, so the VIs selected for analysis were from November of the previous year to October of the harvest year.Figure 5 presents the results of the correlation coefficient calculation using VIs from January and April.The gray areas in the figure have R-values that fall below the 0.05 significance level.A comparison with the map in figure 4, which covers the period from  January to April, reveals that the regions with a high correlation coefficient are also the areas where the VIs showed the most noticeable changes.By sorting the correlation coefficients between coffee yield and VIs in figure 5 in descending order and then calculating the frequency for each month, we generated figure 6, which shows the values of R and the frequency of occurrence at the 10% and 90% thresholds.
Corresponding to a 90% occurrence frequency (as shown in figure 6(b)), the values of R were negative in all months.Except for TVDI from January to April, the values of R were usually less than Rcr at a significance level of 0.05.Hence, figures 6(a) and (b) indicate that the period from January to April was the most sensitive time for coffee yield.The mean value of R over this period, with a 10% threshold for VCI, TCI, and VHI, was 0.65, 0.66, and 0.69, respectively, with a 90% cutoff and −0.54 for TVDI.

Multiple correlation coefficient between coffee yield and VIs and time variable
The multiple correlation coefficient between coffee yield and VIs, along with a time variable, was used to eliminate the effect of trends in identifying the coffee yield-sensitive areas.In this statistical analysis, the average coffee yield of Dak Lak was the dependent variable, and the values of VIs at grid cells and a time variable were the independent variables.For simplicity, the time variable was taken as the ordinal number of the analysis year.The grid cells with high multiple correlation coefficients were selected to determine the coffee yield-sensitive areas.
We obtained a map showing the spatial distribution of correlation coefficients when calculating the multiple correlations between coffee yield and the independent variables at each grid cell.Figure 7 represents the map for February.By sorting the values of the correlation coefficients in these maps in descending order and then  calculating the frequency, we obtain figure 8, which displays the R-value corresponding to the frequency of occurrence with 10% and 50% thresholds.
The comparison between figures 6 and 8 indicates that the multiple correlation coefficient increased significantly when adding the linear component.According to figure 8, the highest correlation coefficients were observed in February and March for TCI, while the other indexes had the highest values in January and February.The correlation coefficient was observed to increase rapidly from November to January and then decrease from February to May.During this period, the correlation coefficient underwent the most apparent change, growing rapidly from November to January and falling quickly from February to May.The average correlation coefficient value between coffee yield and VCI, TCI, VHI, and TVDI from December to March was 0.78, 0.77, 0.79, and 0.76, respectively.These results suggest that the selected variables correlated with the coffee yield during the critical period before harvest.
To determine the sensitive area, it was essential to analyze the correlation between coffee yield and VIs over some time greater than one month.Figures 6 and 8 show that the periods from January to April and December to March exhibited the highest correlation coefficients.Thus, the average VIs during these periods were chosen for analysis.The values of the multiple correlation coefficients during these periods can be seen in figure 9.
Figure 9 demonstrates that data from January to April yields higher correlation coefficients.When analyzing the correlation coefficient based on the frequency of occurrence with a threshold of 10%, the values of R using VIs from December to March were found to be 0.80, 0.77, 0.80, and 0.76, respectively.Similarly, the values of R using data from January to April were determined to be 0.81, 0.78, 0.82, and 0.77.Thus, the period from January to April was considered the most appropriate time for determining the coffee yield-sensitive area.

Determining the coffee yield sensitive area
The process of determining the coffee yield sensitive area involved creating masks to extract data from figure 9 for the period January to April.The goal was to identify the area with the best relationship with the coffee yield, which would be designated as the sensitive area.The following steps were taken to achieve this:  1. Create masks to extract data.These masks correspond to the R values in figures 9(e)-(h), which are higher than a given occurrence threshold.These thresholds corresponded to the occurrence frequencies of 30%, 25%, 20%, 15%, 10%, and 5% in descending order of R.
2. Determine the average VIs for these masks from January to April of the study year.
3. Calculate the multiple correlation coefficient between coffee yield, a time variable, and the value of VIs obtained in step 2.
4. Implement the warning zone selection based on the results of the previous steps.
Figure 10 displays the results of a sensitivity analysis of the correlation between coffee yield, a time variable, and the VIs extracted from the mask, as represented by the multiple correlation coefficient.Ranking the indices in ascending order of sensitivity to coffee yield showed that VHI was at the top, followed by VCI, TDVI, and TCI.The highest correlation coefficient, except for TCI, was observed in January and February.Furthermore, figure 10 demonstrates that using a smaller percentage of selected grid cells results in a higher correlation coefficient.
The results from the analysis indicate that VHI was the most appropriate index to predict coffee yield. Figure 10 shows that when the selected grid cells were reduced below 8%, the correlation coefficient increased, but not significantly.Hence, this 8% threshold was chosen to define the sensitive area (figure 10(b)).In figure 10(a), the mask area ranges from 5% to 30% with a step of 5%.In figure 10(b), due to the leading role of VHI, the mask is taken in more detail with an area ranging from 2% to 30% and a step of 1%.The results of the sensitive area determination using VHI and an 8% threshold are presented in figure 11.
The coffee yield-sensitive area occupied an area of 1,042 km 2 , which accounted for approximately 50% of the coffee-growing region in Dak Lak province in 2020.As shown in figure 11, the sensitive area was mainly situated within the APPA.The sensitive area was characterized by a high elevation and was located farther away from the main river than the APPA.Furthermore, most of it was located on top of hills and where it did not  receive water from neighboring regions.The data regarding the topographical elevation and slope for APPA, the monitoring region, and the entire Dak Lak province are displayed in figure 12 and table 3. Figure 12(a) illustrates that the elevation density line of Dak Lak province and APPA have a typical highest peak at the exact location, with a height of approximately 455 meters.This peak represents the flattest area utilized for growing crops that VIst throughout the year.
Examining the Differences in Terrain Elevation and Slope between the Sensitive Area and APPA: figure 12(a) shows that the terrain elevation in the sensitive area was higher than in APPA, with a median difference of 159 m and a mean value of 90 m. Figure 12(b) and table 3 indicate that the slope of APPA had the highest density of 2.4°a nd a median of 3°.In contrast, the sensitive area was observed to have a steeper slope, with a median difference of 0.9°and a mean difference of 0.5°compared to APPA.The position of APPA in relation to surface water sources was found to be more favorable than the sensitive area based on the results in figure 13.This was   determined by constructing a grid of Euclidean distance to the river of the same size as the grid of the sensitive area.
Comparing the distance of APPA and sensitive area to the river, figure 13 and table 4 reveal that APPA is often situated along the river and is mainly concentrated within 0.7 km of the river.The mean value of the distance of the APPA grid cells from the river was found to be 2.1 km, with 90% of the sites being less than 6.6 km away from the river.In contrast, the sensitive area was found to be much further away, with a median distance of 16.6 km from the river, which is 14.5 km farther than that of APPA.The distribution of the grid cells of the sensitive area from the river showed that many grid cells were far from the river, with 10% located more than 25.8 km away.
The distribution of soil types in Dak Lak, APPA, and the sensitive area is depicted in figure 14.Dak Lak is home to four predominant soil types: Ferric Acrisols (Af), Orthic Acrisols (Ao), Rhodic Ferrasols (Fr), and Pellic Vertisols (Vb).These soil types are suitable for growing perennial and annual crops, with Af and Ao being ideal for agriculture.Nearly 96% of Dak Lak's land is covered by Af, Ao, and Fr soils.The dominant soil types in APPA are Af and Fr, making up 36.6% and 47.6% of the area, respectively.These soils are well-suited for coffee cultivation.On the other hand, the sensitive area is primarily comprised of Fr soil, which accounts for 71.1% of the region.However, Fr soil is lower in nutrients and has fewer water-retention capabilities than Af and Ao.Hence, the sensitive area has less fertile soil and a limited water-holding capacity compared to Dak Lak and APPA.
As indicated by the analysis results, the sensitive area has steeper terrain, is located farther from the river, and has less fertile soil with a lower water-holding capacity than APPA.It is also situated at the highest elevations within the APPA, away from the surface water source, making it highly reliant on natural rainfall.The steep slopes result in increased runoff and decreased infiltration, leading to less water being stored in the root zone.Moreover, the poor water retention capacity of the soil in the sensitive area makes it susceptible to drought.This heightened vulnerability to adverse weather conditions is likely why it was selected as a monitoring area to observe the effect of weather on plant growth and yield, especially for plants that require a lot of water, such as coffee trees.
3.5.The forecasting of coffee yield 3.5.1.Utilizing VIs extracted from the sensitive area In this section, the coffee yield was predicted by multiple linear regression with variables taken from VIs and a time variable.Based on the mask of the sensitive area, the mean values of the VIs in the masked area were used as the predictor variable.These variables were in the form of monthly data.The average value over several months was also included to build the regression equation.Since October to December was the coffee harvest period, and January to April was when VIs of the monitoring area were very sensitive to coffee yield, VIs from January to April were selected to build the prediction equation.Thus, the forecasting equations would be made with the forecast period in advance from 6 months (using the forecast variable of January to April) to 9 months (using the forecast variable of January).
The forecasting equation was formulated to predict coffee yield for all districts, considering the standard variable form.The variables that were selected for this equation were based on the stepwise regression method, and their efficiency was evaluated using the LOOCV technique.Table 5 displays the results of the predictor variables participating in the prediction equation and their statistical coefficients.The predictor variables VHI i-j and TDVI i-j in this table represent the average values of VHI and TDVI from month i to j, obtained from the sensitive area, and t is the time variable.The values for i and j were chosen from January to April.
According to the results presented in table 5, most of the forecasting models (except for the 9-month advance forecast period) consisted of three variables.The first selected variable in all models was VHI, indicating its critical role in determining the quality of the forecast.The second variable chosen was TVDI in the early forecast periods with fewer than nine months.The final variable in these models was time (t), which had less influence than VHI and TVDI.It is worth noting that VCI and TCI were not included in the forecasting equation, as their information was already contained within VHI.The results also showed that as the forecast period decreased, the value of the adjusted R-square increased significantly, especially from 9 months to 8 months.The coffee yield prediction results were assessed using the LOOCV method.This approach offers an advantage over a single test set, as it involves using all observed data in both the training and evaluation phases, leading to a more trustworthy evaluation.Repeating the LOOCV process many times, a series of forecasts was generated that was the same length as the observation series.Figure 15 displays a scatterplot comparing the model's predicted coffee yields to the observed results for a 6 to 9-month period.The coffee yield was maintained as a standard variable without units in this representation.The accuracy of the forecast was evaluated after converting the coffee yield from the forecasting equation into observed data.This evaluation was conducted for each administrative unit.Figure 16 compares the predicted and actual productivity for the 6-month forecast period within the organizational units.The assessment of prediction accuracy was based on the following metrics: R, RMSE, MAPE, and d.The results of this evaluation, as indicated by these coefficients, are displayed in table 6.The results in table 6 indicate that the prediction quality of coffee yields was relatively high.As the forecast period decreased, the accuracy of the forecast improved, with the most noticeable improvement being seen when reducing the forecast period from 9 months to 8 months.This improvement was attributed to the cumulative period of the predictor variables becoming more informative.Using VHI 1-2 or VHI 1-3 instead of VHI 1-1 for 9-month forecasts resulted in more accurate predictions.Furthermore, the improvement in forecast quality was due to the addition of the TVDI i-j and j -i components, which became more influential as the forecast time decreased.
For a 9-month forecast period, the parameters R, RMSE, MAPE, and d were in the range of 0.6-0.8,0.16-0.25 ton/ha, 6.2%-9.1%,and 0.86-0.93,respectively.When the forecast period was reduced to 6 months, these coefficients had values in the range of 0.63 -0.9, 0.12-0.17ton/ha, 4.9%-6.9%,and 0.91-0.97.For a forecast period of more than six months but less than nine months, the MAPE was in the range of 4.9% to 7.5%.
The coefficients for the regression equation predicting coffee yield in table 6 were for the district and the entire Dak Lak province.The accuracy would be improved by retaining the selected variables in this table and constructing a new forecasting equation for each locality.In 2022, Dinh et al conducted a study using multiple linear regression and regularization techniques such as PCA and leave-one-out to assess the impact of weather on coffee yield in five provinces in the Central Highlands.The model predicted coffee yield three to six months before the harvest, with validation results indicating an R coefficient value between 0.36 and 0.6 and RMSE values between 0.16 and 0.29 ton/ha.
Our study found that using VIs extracted from the identified sensitive area improved the accuracy of the prediction compared to previous studies.Our analysis also had a more accurate prediction and could make the forecast three months earlier than the study by Dinh et al.The sensitive area was highly suitable for extracting the VIs as predictive variables through the evaluation of forecast quality.The VHI variable combines information from both NDVI and LST, while TVDI represents the soil moisture content.These variables showed the area's sensitivity to rainfall, temperature, and soil moisture levels.VHI and TVDI variables were collected between January and April, allowing for an early prediction of the coffee yield, 6 to 9 months before the harvest.This period was crucial for the growth and development of coffee plants, including the formation and development of flower buds and flowers.Hence, the weather conditions during this time significantly impacted productivity, and dry and insufficient water conditions could negatively affect the coffee tree's growth and development (Dinh et al 2022).

The coffee yield forecast is based on VIs data obtained from APPA by each district
The comparison of the forecast accuracy between predictor variables obtained from APPA by each district and those obtained from the sensitive area was performed in this study.The coffee yield prediction was carried out using multiple linear regression with variables derived from VIs and a time factor.The selection of variables and the number of variables were based on the stepwise regression method and the evaluation of forecast quality using the LOOCV technique.
Table 8 displays the results of the statistical analysis of the predictor variables in the forecasting equation.In comparison to table 5, it was found that the accuracy of the forecasting equation decreased significantly when

Conclusions
Through analysis and trend assessment, coffee productivity in Dak Lak province increased by 0.58 tons/ha from 2000 to 2020.Also, during this period, VIs tended to change markedly in the months of the dry season.In the long-term, crops, VCI, TCI, and VHI grew to increase markedly; the part with an increasing significance level of 0.01 accounts for nearly 30% of the area, with a growing trend of 2%/year, accounting for about 10% acreage.Thus, the change in coffee yield is mainly due to human impacts such as changing farming methods and developing water resources; changes in VIs are primarily related to land use changes and improved farming techniques.
The study determined the coffee-sensitive area using the multiple correlation coefficient between coffee yield, VIs, and a time variable in grid plots before harvest.This method accounted for the influence of trends on correlation coefficient magnitude.The resulting coffee yield-sensitive area, covering 8% of Dak Lak province, is mostly within long-term agricultural cropland.This area has a higher elevation, slope, and distance to the river than the general conditions of the whole long-term agricultural crop growing area, along with poor soil fertility and lower water holding capacity.
As per the correlation coefficient analysis between coffee yield and VIs, it has been demonstrated that the grid cells exhibiting high correlation coefficients also show significant changes in VIs.However, the correlation coefficient may be relatively high due to the rapid increase in coffee yield and marked changes in VIs trend in some regions, without indicating the impact of abnormal weather.This characteristic was leveraged to identify a sensitive area for coffee yield and extract VIs as predictive variables.
The evaluation results indicate high forecast reliability by utilizing the sensitive area as a mask to extract VIs as predictor variables and performing yield prediction via step-by-step linear regression and the Leave-one-out cross-validation technique.During a 6 to 9-month forecast period, Willmott's index of agreement ranges from 0.85 to 0.97 and MAPE from 4.9 to 7.5%.Compared to extracting variables from APPA based on administrative boundaries, our method using the sensitive area as a mask demonstrates higher forecast accuracy, with d between 0.81-0.94and MAPE between 6.3%-12.0%.Our research also provides higher accuracy in predicting coffee yield for Dak Lak and neighboring provinces, with a longer early forecast time.
As stated in the analysis, VHI is the most sensitive index, with the period between January and March showing the best relationship between VHI and coffee yield.In addition to VHI, TVDI is the second selected variable in the forecasting equations, with VHI being calculated from NDVI and LST and TVDI being related to topsoil water content.The dry season months, including temperature, rainfall, soil moisture, and stored surface water, significantly affect coffee yield in Dak Lak.
The findings of this research can aid decision-makers in formulating and executing sustainable agricultural policies and strategies, as well as enhancing resource allocation and management in coffee-growing areas.Additionally, this study can serve as a reference point for future research on coffee yield prediction and other regional crops to improve coffee production and farmers' living conditions in Dak Lak and the Central Highlands region.
Vietnam and the Government of the Wallonia-Bruxelles during the period of 2019-2022.We express our sincere gratitude to the entities involved in the implementation of this project.
We extend appreciation to the Dak Nong province for generously providing the essential data required for this study.Detailed population data for each commune within Dak Nong province were collected and utilized.Additionally, the valuable soil and land use data were acquired from the Department of Agriculture and Rural Development of Dak Nong province.The research also leveraged satellite data, which were sourced from the United States Geological Survey website (accessible at: https://earthexplorer.usgs.gov/)and were retrieved on October 25, 2022.
Our heartfelt thanks go out to all these organizations and entities for their indispensable contributions, without which this study would not have been possible.

Figure 1 .
Figure 1.Location and topography of Dak Lak province, Agricultural Perennial Planted Area (APPA), and central coffee growing districts.

Figure 2 .
Figure 2. Steps to build the coffee yield forecasting equation.The two dashed line boxes represent two different ways of creating X i .
b, c, and d represent the coefficients of the linear regression equations for LST max and LST min , calculated using NDVI data.Based on the LST min and LST max values, TVDI is determined as follows:

Figure 5 .
Figure 5. Correlation coefficients between VIs in January and April with Coffee Yield.

Figure 6 .
Figure 6.Correlation coefficients of VCI, TCI, VHI, and TVDI with the coffee yield corresponding to the threshold of occurrence: (a) 10% and b) 90%.

Figure 7 .
Figure 7. Multiple correlation coefficient between coffee yield and VIs and a time variable in February.

Figure 8 .
Figure 8.Multiple correlation coefficient between coffee yield and VIs and a time variable with (a) 10% and (b) 50% occurrence frequency, respectively.

Figure 9 .
Figure 9. Multiple correlation coefficient between coffee yield with a time variable and mean value of VIs for the periods from December to March and from January to April.

Figure 10 .
Figure 10.(a) Multiple correlation coefficient between coffee yield with a time variable and average value of VIs extracted by masks for each 1% of the area between 2% and 30%, (b) The same as (a) but only for VHI.

Figure 11 .
Figure 11.The coffee yield sensitive area.

Figure 12 .
Figure 12.Density distribution of (a) elevation and (b) terrain slope.

Figure 13 .
Figure 13.Density distribution of distance from APPA and sensitive area to the river.

Figure 14 .
Figure 14.Percentage area of the main soil types for the whole Dak Lak, APPA, and sensitive area.

Figure 15 .
Figure 15.Scatter plot of observed versus model-simulated coffee yield.

Figure 16 .
Figure 16.Scatter plot of observed coffee yield versus model-simulated yield, with a 6-month advance forecast.

Table 1 .
Sen's slope of coffee productivity in the period 2000-2020.

Table 2 .
Percentage of uptrend and downtrend areas.

Table 3 .
Topographic elevations and slopes correspond to some occurrence frequencies.

Table 4 .
Distance from APPA and sensitive area to the river.

Table 5 .
Coefficients of the linear regression equation simulating coffee yield with the predictor variables VIs extracted from the sensitive area and a time variable.
Table 7 demonstrates this improvement when implemented in the Krong Buk district.Compared to table 6, the value of MAPE decreased by approximately 2%.In recent years, many studies have aimed to predict coffee yield in Vietnam.In 2018, Kouadio et al used the Extreme Learning Machine model and soil factors such as organic matter, available potassium, boron, sulfur,

Table 6 .
Accuracy of coffee yield prediction results for five districts and the entire province of Dak Lak with predictor variables extracted from the sensitive area.

Table 7 .
The accuracy of coffee yield forecast results when the forecasting equation is built specifically for the Krong Buk district., nitrogen, exchangeable calcium, magnesium, and pH as predictors to forecast the coffee yield in Lam Dong province.The results showed an RMSE of 0.496 tons/ha and a MAPE of 7.9% in the independent testing phase.In 2021, Kouadio et al created a Robusta model that utilized weather data and previous growing season information to simulate the growth and development of coffee plants.The model was applied to five provinces in the Central Highlands, yielding RMSE and MAPE values ranging from 0.24 to 0.33 tons/ha and 9% to 14%, respectively.Willmott's consent index was greater than or equal to 0.71 in the model for three of the five provinces.The model was also tested with remote sensing and climate data, resulting in MAPE and RMSE values similar to or less than 12% and 0.29 tons/ha, respectively.

Table 8 .
The coefficients of the linear regression equation simulating the standard variable of coffee yield with the predictor variables VIs are extracted on APPA for each district.