Maize Production Forecasting in Iraq: A Box-Jenkins Approach for the Period of 2022-2026

This study aims to forecast the production of Iraq’s maize crop using Auto Regressive Integrated Moving Average (ARIMA) models. Using the semi-annual data for the period 1980-2021, the production of the maize crop was forecasted for 5 years, starting from 2022 to 2026. The data were obtained from the Ministry of Planning and International Cooperation, Planning and Monitoring Department, Agricultural Statistics Department, Iraq. The forecasted production of maize in Iraq during the period 2021-2026 is characterized by fluctuations between highs and lows, similar to the production of maize during the study period. The highest expected production value of maize in Iraq was in the second half of 2023, reaching 585 thousand tons, while the lowest expected production was 500 thousand tons in the first half of 2022. This study is significant in providing valuable insights that can guide the formulation of effective policies regarding maize production, pricing, and consumption in the country.


Introduction
The agricultural sector is one of the most vital industries in any country's economy, as it provides food, employment opportunities, and raw materials for various industries.In Iraq, this sector is the largest non-oil economic sector, contributing 5-10% of the total GDP and employing around 25% of the workforce.Therefore, it is crucial to develop the agricultural sector comprehensively to ensure the overall development in Iraq.Maize is a significant food source globally, used to produce maize oil, starch, flour, and animal feed.It is also rich in essential nutrients such as vitamins, minerals, and dietary fiber, which promote digestive health and prevent diseases.Maize is the third-largest crop in the world, cultivated mainly in North and South America, Eastern Europe, Russia, China, India, and South Africa.In Iraq, it ranks fourth after wheat, rice, and barley.Despite the availability of fertile land, water, suitable climate, and a workforce, maize productivity in Iraq fluctuates between highs and lows, and the crop's potential is underutilized.To increase production, accurate forecasting of maize production is essential, and this requires reliable data and advanced statistical methods.Therefore, it is important to forecast the production of maize in the coming years and to reach estimates that can be relied upon in order to develop appropriate policies to increase production.In recent years, there has been an increasing interest in developing accurate forecasting models for agricultural production, including maize, to provide farmers, policymakers, and other stakeholders with timely and reliable information for planning and decision-making.One of the widely used methods for time-series forecasting is the Box-Jenkins methodology, also known as ARIMA modelling.Despite the significance of maize production in Iraq, there are few studies that have been conducted to forecast its production using advanced statistical methods.Previous studies mainly focused on examining the factors affecting maize production in Iraq, such as climate change, soil quality, and irrigation systems.However, there are few studies which utilized advanced statistical techniques to predict maize production in Iraq.A study by [1] used a regression analysis to investigate the relationship between maize production and various environmental factors in Iraq.Another study by [2] used a neural network model to predict maize production in the Diyala Governorate in Iraq.However, to the best of our knowledge, no study utilized ARIMA or other time series models to predict maize production in Iraq.
In this study, we aim to apply the Box-Jenkins methodology to develop a forecasting model for maize production in Iraq from 2022 to 2026, based on historical data on maize production and relevant environmental and socio-economic factors.Specifically, we will use the ARIMA model to analyze the time-series data and identify the underlying patterns and trends in maize production, and then use this information to generate forecasts for future production.The results of this study are expected to provide valuable insights into the factors that influence maize production in Iraq and to help stakeholders in the agricultural sector make informed decisions about planting, harvesting, and marketing of maize.Moreover, the proposed forecasting model could be extended and applied to other crops and regions, thus contributing to the development of effective agricultural policies and strategies for sustainable food production.The rest of this study consists of six parts.Part two discusses a set of previous studies that focused on forecasting maize production using Box-Jenkins methodology, while part three provides a brief overview of maize production in Iraq.Part four presents the Box-Jenkins methodology, and part five focuses on the presentation and analysis of results.Part six concludes with a summary of the study's findings and recommendations.

Literature Review
The Box-Jenkins methodology, also known as ARIMA modeling, is widely used in time series analysis for forecasting various economic and financial variables.Numerous studies have applied this methodology to forecast maize production in various countries.The results have consistently shown that the Box-Jenkins approach produces accurate forecasts and helps with decision-making for agriculturerelated policy and planning.For example, [3] modelled and forecasted the maize prices in South Africa using ARIMA models, which performed well and were recommended for use by policymakers.Similarly, [4] forecasted maize yield in Swaziland.Using Box-Jenkins models, the models were found to be effective in improving maize yield and maximizing profits for farmers.In another study, [5] used ARIMA models to analyze and forecast the production of maize in Bangladesh.The study showed that the models were effective and recommended their use for making informed decisions about maize production.In contrast, [6] compared Box-Jenkins and ARIMA models for forecasting maize yields in Ghana.The study found both models to be effective, but the Box-Jenkins model was slightly more accurate and, therefore, recommended for use.Meanwhile, [7] found that the best model for forecasting maize production in Nigeria was ARIMA (2,1,1), providing accurate short-term forecasts.However, the model had limited accuracy for longerterm forecasting, and future studies should incorporate exogenous variables for better results.[8] also found ARIMA (1,1,1) to be the best model for forecasting maize production in Zimbabwe, and recommended it for decision-making purposes.[9] found that both ARIMA and artificial neural network (ANN) models were effective for forecasting maize production in South Africa.The authors recommended the use of both models for maize production forecasting, with ARIMA being more suitable for short-term forecasting and ANN being more suitable for longer-term forecasting.They suggested that future research investigates the impact of exogenous variables, such as weather and soil data, on maize production.[10] found that ARIMA (2,1,2) was the best model for forecasting maize production in Iran.The model provided accurate short-term forecasts, but its accuracy decreased for longer-term forecasts.The authors recommended using the model for short-term forecasting and suggested that future studies explore the use of exogenous variables, such as weather and irrigation data, to improve the accuracy of longer-term forecasts.[11] found that ARIMA (2,1,2) was the best model for forecasting maize production in Uganda.The model provided accurate short-term forecasts, but its accuracy decreased for longer-term forecasts.The authors recommended using the model for short-term forecasting and suggested that future studies investigate the impact of exogenous variables, such as weather and soil data, on maize production.There have been several studies on maize production forecasting in Arab countries using Box-Jenkins methodology.For instance, a study conducted by [12] applied ARIMA models to forecast maize production in Sudan.The study found that the best model for forecasting maize production in Sudan was ARIMA (1,1,1).The authors recommended using the model for short-term forecasting and suggested that future studies consider the incorporation of exogenous variables for better results.Another study by [13] used Box-Jenkins methodology to forecast maize production in Iraq.The study found that the best model for forecasting maize production in Iraq was ARIMA (1,1,1) because the model provided accurate short-term forecasts.The authors recommended the use of the model for decision-making purposes and suggested that further research investigates the impact of exogenous variables on maize production.

Maize Production in Iraq for the Period 1980-2021
Another The production of maize in Iraq during this period was characterized by fluctuations between highs and lows, ranging from a minimum of about 28.The production of maize in Iraq can be divided into three periods.The first period is from 1980 to 1988, the second period is from 1989 to 2002, and the third period is from 2003 to the present.The first period was characterized by low maize production compared to that in other periods, as Iraq was at war with Iran during this period, and the state devoted all of its resources to the war.The average corn production during this period was 46.5 thousand tons, and the lowest production was recorded in 1983 at 28.1 thousand tons, which is the lowest production during the entire study period.The highest production for this crop during this period was 77.2 thousand tons in 1988.
As for the second period (1989)(1990)(1991)(1992)(1993)(1994)(1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002), Iraq suffered from a comprehensive economic embargo, which led Iraq to focus on agriculture to meet its food needs.This period was characterized by an increase in corn production, with the lowest production of corn during this period being 101.1 thousand tons in 1995, and the highest production being 578.6 thousand tons in 1995, with an average production of maize of 285.4 thousand tons during this period.However, the production of this crop also fluctuated between highs and lows as farmers turned to wheat and barley crops instead of maize.The third period, which extends from 2003 to the present, witnessed the occupation of Iraq and fundamental changes in the political system.The production of corn during this period also fluctuated, with an average production of 413.1 thousand tons, the lowest production being 235.7 thousand tons in 2002, and the highest production being 831.3 thousand tons in 2013, which is the highest production during the whole study period.

Data and Methods
The main objective of the study is to forecast the production of maize in Iraq for the period 2022-2026, using the Box-Jenkins methodology.This methodology was first applied in 1970 by [14] on time series data and is based on the Autoregressive (AR) Model, Moving Average (MA) Model, and Autoregressive Moving Average (ARMA) Model.The semi-annual data from 1980 to 2021 on the production of maize in Iraq was collected from official sources, such as the Ministry of Agriculture, the Agricultural Statistics Department, the Labor Force, the Central Statistics Organization, and Information Technology through publications and records.The natural logarithm of maize production has been taken.

Box Jenkins Approach
The Box-Jenkins method is a time series analysis technique that involves modeling a time series using Autoregressive Integrated Moving Average (ARIMA) models.An ARIMA model is a combination of autoregressive (AR) and moving average (MA) models that have been differenced to make the series stationary as in the following equation:  :moving average parameters.
The orders of the ARIMA model are denoted by p, d, and q, where: p: the order of the autoregressive (AR) component d: the order of differencing required to make the series stationary q: the order of the moving average (MA) component The Box-Jenkins method involves four stages:

Identification
This is the stage of identifying the model, i.e. identifying the orders (d, q, p) to formulate a number of models that accurately describe the stable series.This is done through the following steps:  Plotting the data graphically to get an idea of the path of the time series and identify the data, and to determine the extent of stability in the data in terms of its mean and variance  Examining the Autocorrelation (AC) and Partial Autocorrelation (PAC) coefficients for specific lag periods through graphical representation of the Correlogram for each of PAC and AC, and through the overall test of Box-Ljung Statistic for each correlation coefficient or partial test (for each correlation coefficient individually), which is the limits of accepting the null hypothesis of the insignificance of the sum of the squares of the correlation coefficients. Stationary Test: In time series analysis, a stationary process is one whose statistical properties remain constant over time.More formally, a process is stationary if its mean, variance, and autocorrelation structure do not change over time.
The concept of stationarity is important in time series analysis because many statistical models and methods assume that the data is stationary.If the data is non-stationary, these methods may not provide accurate results [15].
There are several tests that can be used to determine whether a time series is stationary, including the Augmented Dickey-Fuller (ADF) test, Phillips-Perron (PP) and the Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test [16].
 Augmented Dickey-Fuller (ADF) test The ADF test is considered one of the most important and commonly used tests for testing time series stability.The ADF test statistic is compared to critical values from a standard normal distribution to determine whether to reject or fail to reject the null hypothesis.If the ADF test statistic is less than the critical value, then the hypothesis of non-stationarity is rejected, and the time series is considered stationary.If the ADF test statistic is greater than the critical value, then the null hypothesis fails to be rejected, and the time series is considered non-stationary [17].
 Phillips-Perron (PP) test The PP test is based on the more general assumption that the time series is generated by an ARIMA process.Therefore, the PP test has a better and more accurate testing ability than the ADF test.The PP statistics are calculated and compared to the critical value of Mackinnon.If the calculated value is less than the critical value, then the series is unstable and vice versa.
 Kwiatkowski, Phillips, Schmidt, and Shin (KPSS) test The KPSS test is based on re-examining the Autocorrelation (AC) coefficients and Partial Autocorrelation (PAC) coefficients using correlation, and through PAC, the degree of Autoregression (AR) can be determined, while through AC coefficients, the degree of Moving Average (MA) can be determined.

Estimation
The estimation stage involves estimating the model parameters using techniques such as maximum likelihood estimation (MLE) or the method of ordinary least squares (OLS).Once the orders of the ARIMA model have been identified, the parameters of the model are estimated using historical data.The most commonly used method for estimating the parameters is the MLE method, which finds the values of the model parameters that maximize the likelihood of the observed data given the model.The AIC, SC, S.E. of regression, SIGMASQ, and Adjusted R-squared are commonly used criteria for comparing the goodness of fit of different models.A model with a lower AIC, SC, S.E. of regression, SIGMASQ, and the highest value of Adjusted R-squared is considered to be better.

Diagnostic
The selection of the appropriate model and its fitness for representing time series data is determined through the following tests:  t-statistic test, which checks the statistical significance of the model parameters to ensure they are significantly different from zero. Residual analysis tests, including the Ljung-Box test, which verifies the presence of autocorrelation, and  Jarque-Bera test, which checks if the residuals are normally distributed. Model stationary test to ensure that the model is stable over time.

Forecasting
The forecasting stage is the final step in the process of studying and analyzing time series data.After verifying the validity of the chosen model, the model is used to forecast the future values of the phenomenon being studied for the desired period.This is done by substituting the previous and current values of the dependent variable t y and the residuals t  e as estimated values to calculate the error and obtain the first forecasted values of 1  t y for one time period, and so on for the subsequent periods.

Results and Discussion
To build a high-accuracy predictive model for maize production in Iraq from 2022  1 to 2026 2 , it is necessary to first ensure the stationarity of the production data for maize, which is the first stage of the Box-Jenkins Model.This can be done by analysing the production data from 1980 to 2021.

Stationarity Test
To test whether the time series for corn production in Iraq from 1980 to 2021 is stationary at a level or not, the series is plotted to identify its behavior during that period as shown in Figure 1.
From Figure 1, it is evident that the time series is increasing over time, indicating a general trend and therefore, the time series is not stationary.This requires taking the first differences of the time series to make it stationary.For further accuracy, we plot both the Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF), as shown in Figure 2. It is evident from figure 2 that the Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) coefficients at the 5% significance level are outside the confidence bounds.This indicates that the time series is not stationary, and therefore, we reject the null hypothesis of stationarity, and we accept the alternative hypothesis of non-stationarity.

Unit Root Test
For more accuracy in determining the stationary of the series, the ADF and PP tests are used, which are the most important unit root tests.Table 1 shows the results of both tests at the level and first difference at the 5% significance level, with intercept, trend and intercept and none.According to Table 1, it is evident that the p-value > 0.05 at the level, indicating the presence of a unit root.Therefore, we accept the null hypothesis that the time series of maize production is non-stationary at the level.However, the p-value < 0.05 when taking the first difference of the time series of maize production, thus accepting the alternative hypothesis that the time series is stationary at the first difference.Hence, we can conclude that the time series of maize production in Iraq is non-stationary at the level and stationary at the first difference, confirming the results obtained from the graphical representation and autocorrelation functions.

Identification and Diagnosis
ARIMA model parameters (p, d, q) are determined using the Autocorrelation Function ACF and Partial Autocorrelation Function PACF, as well as the shape of the correlation between the coefficients of each of these functions and the length of the gap.The PACF function determines the order of AR, as this function becomes insignificant after a certain number of significant lags, which is the order of AR.On the other hand, the ACF function determines the order of MA if this function becomes insignificant after a certain number of lags, where the number of significant lags is the order of MA.If both ACF and PACF oscillate and do not disappear after a certain number of lags, the model is ARMA (p, q). Figure 3 illustrates the ACF and PACF of maize production in Iraq for the period (1980-2021).From Figure 3, it is evident that the autocorrelation coefficients are outside the confidence interval in observations 4, 6, and 12, while the partial autocorrelation coefficients are outside the confidence interval in observations 4, 6, and 8. Therefore, the following models can be suggested for analyzing the Maize production time series in Iraq and making forecasting: ARIMA (4,1,4), ARIMA (4,1,6), ARIMA

Estimation
After determining the orders of both AR and MA through observing the autocorrelation and partial autocorrelation functions, the estimation stage follows, where Ordinary Least Squares (OLS) is used.
Based on the results of the tests conducted on the proposed models, as shown in Table 2, a comparison between these models can be made to determine which one has the highest accuracy and forecasting ability.From Table 2, it is evident that the proposed ARIMA models have the lowest values of AIC, SC, S.E. of regression, and SIGMASQ, and the highest value of the Adjusted R-Square criterion is associated with the (4,1,12) ARIMA model.Therefore, the ARIMA (4,1,12) model has achieved the highest predictive accuracy compared to the other proposed models, and thus it is the appropriate model that can be used to describe the data and make forecasts.Table 5 shows the estimation of the ARIMA (4,1,12) model using the Ordinary Least Squares (OLS) method.

Diagnostic Checking of the Mode
The suitability of the ARIMA (4,1,12) model to represent the time series data of maize production in Iraq and its ability to make predictions were tested through the following tests:  Based on Table 3, it can be observed that the AR and MA parameters are statistically significant, as their p-values were less than 0.05. The residual test for the selected ARIMA (4,1,12) model was conducted graphically through plotting the ACF and PACF functions to check for the presence of autocorrelation, as shown in Figure 4.The Jarque-Bera test was conducted to determine whether the residuals are normally distributed or not, as shown in Figure 5.The P-value was found to be greater than 0.05, indicating the acceptance of the null hypothesis that the residuals are normally distributed.Finally, the stability of the selected ARIMA (4,1,12) model is tested using the multiple unit root test.As shown in Figure 6, all roots are located inside the unit circle, confirming the stability of the ARIMA (4,1,12) model.

Forecasting
Forecasting is one of the main objectives of any time series analysis.After conducting diagnostic tests, it was found that the ARIMA ( It is apparent from Table 4 and Figure 7 that the forecasted production of maize in Iraq fluctuates between highs and lows, similar to the production of this crop during the study period, which extended from 1980 to 2021.The expected corn production is projected to increase from 500 thousand tons in the first half of 2022 to 585 thousand tons in the second half of 2023.However, this expected production will start to fluctuate between highs and lows until the second half of 2026, where it will reach 574 thousand tons in the second half of 2024, then decrease to 549 thousand tons in the first half of 2026, before slightly increasing to 559 thousand tons in the second quarter of 2026.

Conclusion
Forecasting is an essential tool for predicting the production and productivity of any crop in the near future.The autoregressive integrated moving average (ARIMA) model is considered one of the best models when the data consists of at least 50 observations.This study models and forecasts maize production in Iraq for the semi-annual period of 2022-2026 using the ARIMA model.The study found that the proposed ARIMA models had the lowest values of AIC, SC, S.E. of regression, and SIGMASQ, and the highest value of the adjusted R-Square criterion was associated with the (4,1,12) ARIMA model.The forecasted production of maize in Iraq fluctuates between highs and lows, similar to the production of this crop during the study period from 1980 to 2021.The forecasted maize production is forecasted to increase from 500 thousand tons in the first half of 2022 to 585 thousand tons in the second half of 2023.However, this forecasted production will fluctuate between highs and lows until the second half of 2026, where it will reach 574 thousand tons in the second half of 2024, then decrease to 549 thousand tons in the first half of 2026, before slightly increasing to 559 thousand tons in the second quarter of 2026.Decision-makers in Iraq should increase government funding for agriculture, choose high-yielding varieties, and increase cultivation linking farmers and research institutes.
1 thousand tons in 1983 to a maximum of about 831.3 thousand tons in 2013, as shown in Figure (1).

Figure 1 .
Figure 1.The Production of Maize in Iraq, 1980-2021.Sources: Ministry of Planning and International Cooperation, Planning and Monitoring; Department, Agricultural Statistics Department, Iraq.
forecast values of the y variable.

Figure 2 .
Figure 2. Results of ACF and PACF functions.

Figure 4 .
Figure 4. Autocorrelation and partial autocorrelation functions for the residual series.It is evident from figure4that the proposed ARIMA (4,1,12) model does not suffer from autocorrelation problem as all the autocorrelation and partial autocorrelation coefficients of the residuals fall within the confidence bounds.The Jarque-Bera test was conducted to determine whether the residuals are normally distributed or not, as shown in Figure5.The P-value was found to be greater than 0.05, indicating the acceptance of the null hypothesis that the residuals are normally distributed.

Figure 6 .
Figure 6.Inverse Roots of AR and MA.

Figure 7 .
Figure 7. Actual and Forecasted Values of Maize production in Iraq.

Table 1 .
Results of ADF and P-P tests for Maize production in Iraq for the period (1980-2021).

Table 2 .
The results of the ARMA (p,d,q) test.

Table 3 .
Results of estimating the ARIMA model.

Table 4 .
4,1,12)model can be used for accurate and high predictive forecasting.The next step is to use the ARIMA (4,1,12) model to forecast the maize production in Iraq for the next five years (2022-2026) at the significance level of 0.05, as shown in Table4and figure7.The Maize Production forecasts in Iraq (thousand tons).