A probabilistic framework for forecasting maize yield response to agricultural inputs with sub-seasonal climate predictions

Crop yield results from the complex interaction between genotype, management, and environment. While farmers have control over what genotype to plant and how to manage it, their decisions are often sub-optimal due to climate variability. Sub-seasonal climate predictions embrace the great potential to improve risk analysis and decision-making. However, adequate frameworks integrating future weather uncertainty to predict crop outcomes are lacking. Maize (Zea mays L.) yields are highly sensitive to weather anomalies, and very responsive to plant density (plants m−2). Thus, economic optimal plat density is conditional to the seasonal weather conditions and can be anticipated with seasonal prospects. The aims of this study were to (i) design a model that describes the yield-to-plant density relationship (herein termed as yield–density) as a function of weather variables, and provides probabilistic forecasts for the economic optimum plant density (EOPD), and (ii) analyze the model predictive performance and sources of uncertainty. We present a novel approach to enable decision-making in agriculture using sub-seasonal climate predictions and Bayesian modeling. This model may inform crop management recommendations and accounts for various sources of uncertainty. A Bayesian hierarchical shrinkage model was fitted to the response of maize yield–density trials performed during the 2010–2019 period across seven states in the United States, identifying the relative importance of key weather, crop, and soil variables. Tercile forecasts of precipitation and temperature from the International Research Institute were used to forecast EOPD before the start of the season. The variables with the greatest influence on the yield–density relationship were weather anomalies, especially those variables indicating months with above-normal temperatures. Improvements on climate forecasting may also improve forecasts on yield responses to management, as we found reduced bias and error (by a factor >10), and greater precision (e.g. R2 increased from 0.26 to 0.32) for cases where weather forecasts matched observations. This study may contribute to the development of decision-support tools that can trigger discussions between farmers and consultants about management strategies and their associated risks.


Introduction
Seasonal and sub-seasonal climate predictions are rapidly evolving into decision support tools to assist the agricultural, environmental, health and policy sectors. The advance of computational algorithms and decision support tools have led to a substantial increase in the availability of climatic forecast products [1,2]. In addition, the development of machine learning algorithms has created new opportunities for improving climate predictions [3,4]. The agricultural sector is one of the main consumers of climatic forecasts due to the causal link between precipitation and temperature with crop yields [5]. Agricultural outputs are greatly affected by climate anomalies, like extreme temperature and precipitation [6], and several studies have highlighted the value of climate forecasts to optimize management practices in agriculture [7][8][9][10][11]. In order to augment yield prediction and in-season adjustments of management recommendations, it is crucial to enhance and integrate seasonal climate forecasts into crop models. The advantages of probabilistic seasonal and sub-seasonal forecasts remain underexploited, and they are rarely included in agronomic models used by farmers and crop consultants. Increasing the economic value of seasonal climate predictions requires integrating climate and crop information in a single framework that quantifies uncertainty [12].
Bayesian frameworks enable the use of probabilistic climate predictions to better tailor crop management to the expected deviation in climate from normal conditions [13][14][15]. Best crop management practices can be anticipated with data from previous growing seasons, but they also depend heavily on seasonal weather conditions. For this reason, best crop management practices are fully defined during the growing season. Meanwhile, climate predictions are inevitably bound to uncertainty, and the forecasting methods are yet imperfect [16][17][18][19]. Agricultural productivity is equally challenging to predict, with an additional unknown interplay between genotype, environment, and management, GxExM [20,21]. Predictions that are highly weather dependent, like grain yield, thus fit best with a Bayesian framework that assumes all variables random, quantifies uncertainty, and allows probabilistic inference. Results are obtained in the form of probability distributions rather than single values and acknowledge the estimation uncertainty. Previous efforts have successfully embedded weather forecasts into yieldforecasting models [22][23][24], but most current management recommendations are still categorical (i.e. low, medium, or high) or do not use self-explanatory methods (e.g. machine learning algorithms). Rizzo et al [25] integrated sub-seasonal predictions into recommendation models, but without a Bayesian framework and probabilistic forecasts. In this study, we develop a novel application of seasonal climate predictions to inform optimal use of inputs within a Bayesian statistical framework.
One of the most important field crops with a strong demand for predictive management models is maize (Zea mays L.). The existence of GxExM interactions and seasonal weather variation justifies the need to tailor management to the GxE scenarios. GxExM interactions are highly sensitive to climate extremes [26][27][28], even more for maize than for other crops [29]. Critical management decisions in maize include Nitrogen fertilization and plant density (plants m −2 ) determination. In this study, we focus solely on a major production cost for maize, seeds, for the determination of the optimal plant density (OPD, measured in plants m −2 ). Usually, a proxy to define optimum plant density is the combination of genotype and yield environment [30,31]. But, different yields may present similar OPD [32] and thus the cost of seeds needs to be considered to estimate an economic optimum plant density (EOPD). There is a need for a shift towards a new framework that predicts crop management responses accounting for the uncertainty associated to weather, maintaining the interpretability of the model at its core. A reliable approach would be using a mechanistic model that can analyze the yield response to plant density (herein termed as yield-density) [33]. The main objective of this study was to develop a framework to assist agronomic decisions with probabilistic forecasts. The study was focused on forecasting EOPD for maize, as a case study of the main objective. The specific objectives were to (i) design a yield-density model that could identify, determine the relative importance of key weather variables defining this relationship, and provide probabilistic forecasts for the EOPD, and (ii) analyze the predictive performance and the sources of uncertainty.

Data retrieval
Maize hybrid by plant density trials were conducted in 89 counties from seven states in the US during the years 2010-2019. The sites were selected to have similar germplasm among hybrids, between latitudes 37.5 • N and 43.4 • N. The states of Arkansas, Illinois, Indiana, Iowa, Kansas, Nebraska, Missouri, and Ohio were selected to perform the trials. The experimental design was a randomized complete block design in a split-plot arrangement, with plant density as the plot treatment and hybrid in the subplot, and with all sites presenting between 2 and 5 replicates. There were five target plant densities: 44 475; 64 247; 84 016; 103 784 and 123 553 plants ha −1 . The trials were conducted in research sites and farmer fields with fourrow plots (3.05 m wide and 5.4 m long) with 0.76 m row spacing. Plots were consistently fertilized with all recommended nutrients for their respective site and related to their attainable yield levels. At all sites, yield was recorded on the central two rows in each plot, grain moisture was measured, and yields adjusted to 155 g kg −1 moisture. Maize hybrid comparative relative maturity (CRM) and planting date were obtained from Corteva Agriscience. In addition, county-level yield records were retrieved from the USDA-NASS [34] and their 95th percentile was used as a proxy for potential yield.
County-level climate, weather, and soil information was retrieved to fit the statistical model (figure 1). Daily solar radiation (MJ ha −1 ), daily mean temperature ( • C) and monthly precipitation (mm) were retrieved from daymet [35], for the years 1990-2019 (30 years). Water holding capacity from 0-100 cm depth was retrieved from open land map [36], whereas subsurface soil moisture (0-20 cm) was retrieved for 1 month pre-planting, 2 weeks preplanting and at planting time [37]. To obtain countylevel data, we performed an average across all the points in the county weather data grid. Climate terciles of both precipitation and temperature were estimated for 2, 3, and 4 months after planting for each county: using percentiles 33.3 and 66.6 as thresholds, each year from the set 2010-2019 was determined to have either above-, near-, or belownormal precipitation or temperature, for months 2-4 after planting. This format is compatible with the International Research Institute's (IRI) sub-seasonal probabilistic tercile forecasts [38,39], that was used later on to include the climatic predictions and perform forecasts of EOPD.

Statistical model
The data described above were fitted to a Bayesian hierarchical shrinkage model (see Code availability section for more details). The core of the model is a power-law function, that describes aboveground biomass as a function of density: where α is a constant, and β describes the responsiveness to plant density and is expected to be a multiple of a quarter, as a consequence of metabolic scaling [40]. This model was utilized for biological soundness [40][41][42] and modeling convenience, and it has been previously tested to model grain yield [43]. The power-law function is one of the most biologically-robust models to explain biomass response to density [40] and was extended to selfthinning of plant organs (e.g. kernels), with some additional species-and organ-specific constants [41]. Moreover, modeling a curve with a single shape parameter is convenient to avoid identifiability issues when parameters are themselves linear models of other covariables (see below for more information). Thus, yield response to density is described with a power-law function, where y i is the observed yield of the ith observation, that is normally distributed with expectation µ i and variance σ 2 . The expected value is modelled with a power-law function of plant density, depending on the parameters α i and β i . Here, β i represents the responsiveness of yield to density for the ith observation and depends on the resource availability (i.e. the environment) of that observation. Both parameters are themselves linear functions of soil, crop and weather variables. For example, where x ′ α,i is a vector containing information for the ith observation about solar radiation and USDA county-level yield at the 95th percentile, and γ α is a vector of the effects of these variables on parameter α. This means that parameter α is a linear function with an intercept and two variables that are good proxies of potential yield. Similarly, the parameter shaping the curvature of the yield-density reationship, β, is defined as the product of a vector x ′ β,i containing information about crop, soil and weather for the ith observation: hybrid relative maturity (i.e. cycle duration, CRM), soil water content at field capacity, sub-superficial soil moisture at 1 month pre-planting, sub-superficial soil moisture at 1 week pre-planting and sub-superficial soil moisture at planting, average temperature and precipitation, and anomalies at 1, 2, and 3 months after planting. Thus, β i (responsiveness to density of the ith observation) is a linear function of crop cycle duration and other yield-limiting factors for that environment, mostly variables that affect seasonal variation of plant growth: soil water content, soil water holding capacity, precipitation, temperature, and anomalies in both precipitation and temperature. Model hyperparameters were selected in order to obtain good posterior diagnostics (i.e. convergence and adequate mixing of the chains) to obtain reliable results.
Both parameters γ α [equation (4)] and γ β [equation (5)] were modeled with regularized horseshoe priors, a Bayesian prior that performs shrinkage. Shrinkage models are useful and commonly used to detect the most important weather variables affecting yield [43,44]. To our knowledge, this is the first time that a Bayesian shrinkage model is implemented in agronomy to detect the most important weather variables affecting management effects on yield. The Bayesian approach is crucial to assess the risk bound to management decisions. The regularized horseshoe prior is a novel Bayesian method for dimension reduction [45] and has been successfully applied in other fields with sparse data [45][46][47], but its implementation is still incipient overall. The effects of the crop and environmental variables on the individual parameters were the product of a combination of a scaling parameter, and hierarchical shrinkage parameters [48]. A variable was considered relevant if its local shrinkage coefficient's 95% credible interval excluded zero (figure 2). Moreover, the magnitude of standardized coefficient was interpreted as the relative importance of the variables (see Code availability section for more information on the statistical model). The model was fitted using Stan [49] and checked for chain convergence (Rhat < 1.02), and appropriate effective sample size. Coefficient posteriors were analyzed regarding their mean and 95% credible intervals.
The parameter posteriors were used to generate distributions of α and β forecasts for the year left out by combining them with subsurface soil moisture and probabilistic sub-seasonal weather forecasts from IRI [38,39]. The performance of these seasonal forecasts has been evaluated and proved successful [2,50]. For α and β forecasts, the indicators (aboveor below-normal precipitation and temperature) were replaced for random variables generated by a Bernoulli distribution [51] that can be described as where I kmj is the indicator for the condition j respect to normal values (j = above normal, below normal) of the variable m (m = precipitation, temperature) at the moment k (k = 2, 3, or 4 months after planting), with probability of occurrence θ kmj (retrieved from IRI data library). This step incorporates the uncertainty associated to future weather affecting plant growth, synthesized in the sub-seasonal forecast. Parameters α and β were used to estimate the EOPD as the level of input where marginal benefit equals marginal cost.
EOPD was set as 10 log ( cp αβ ) β−1 , where the slope equals the cost:price ratio cp-a random variable following a normal distribution N (0.22, 0.01). A theoretical maximum EOPD of 14 plants m −2 was set [32].
The model was cross-validated leaving one year of data out. The predictive performance was evaluated for both the posteriors of the predictions and the posterior means as point estimates. For the posteriors, we calculated the percentage of observations contained in the 95% credible intervals. For the point estimates, we computed percentage bias error (indicating differences between means of predictions and observations), percentage additive bias (related to systematic additive issues on the predictions), relative efficiency of the model (Erel), Ji and Gallo's agreement coefficient (that contemplates both accuracy and precision), root mean squared error (RMSE) and coefficient of determination (R 2 ). The first four metrics were selected to have a complete dissection and diagnostic of the model performance [52]. RMSE and R 2 were included due to their widespread utilization in scientific literature. Performance metrics were computed twice: once for all posterior draws, and another clustered by agreement between simulated and observed weather indicators [i.e. whether the indicators generated in (equation (5) matched the observed). In this way, we diagnosed the sensitivity of the different performance metrics to the uncertainty in weather outcomes. (C) Standardized coefficients affecting parameters α (intercept) and β (shape). 'α (intercept)' and 'β (shape)' are the intercepts (i.e. baselines) of those parameters. Points indicate posterior means; thick, colored bars indicate 75% credible intervals and black bars indicate 95% credible intervals. The colors indicate whether the 95% credible interval includes zero (meaning that variable is not relevant, red) or not (meaning that variable is relevant, green).

Results
The current study implements a power-law model (yield = α · density β ), where α is a proxy for yield potential, and β the yield responsiveness to density. The output of the model characterizes both crop and environmental effects on the yield-density relationship and offers probabilistic forecasting tools. The effects of weather, soil and crop on this relationship are characterized regarding their relevance, magnitude, and associated uncertainty. Normally, Frequentist approaches study the uncertainty associated to the magnitude of the effects, but not the one associated to shrinkage. The current study analyzes uncertainties associated to effects and shrinkage.
Dynamic weather components such as precipitation and temperature anomalies presented the greatest relevance describing the yield response to density (β), but also with greatest uncertainty. The most important variables indicated by the model were CRM, soil water holding capacity, soil water content 1 month before planting, and the indicators of anomalies in monthly precipitation and temperature ( figure 2). Moreover, the most influential variables (i.e. with greatest standardized coefficients) were the anomalies in monthly precipitation and temperature. Above-normal temperatures had the greatest relative importance in explaining the yield-density relationship: the effect ranged from 0.62 to 1.94 (absolute score), compared to the median for all the components of β (0.43). Above-normal temperatures were also associated to relatively large estimation uncertainty (standard errors between 0.15 and 0.21, when the median for the other components was 0.15). Months 2 and 3 with above-normal mean temperatures were negatively associated with responsiveness (relative score ± standard error of −1.94 ± 0.23, and −0.95 ± 0.15), whereas month 4 after planting was positively associated to β (0.62 ± 0.21). Note that the lower relative importance and greater uncertainty for above-normal temperatures for month 4 may be associated to different effects at lower and higher latitudes.
The estimates of β were centered at the theoretical scaling parameter 0.25 [40], and affected by crop and weather features. Environmental stress and specific maize hybrid traits such as growth cycle duration, CRM, reduced the magnitude of the responsiveness to plant density [figures 2(b) and (c)]: under normal conditions (i.e. all indicators of anomalies set to zero), the estimate for β was approximately 0.246. A hybrid of CRM 100 had a β parameter of 0.06 points lower than a hybrid of CRM 120. Moreover, environments with high mean temperatures in months 2 and 3 after planting had a reduced response to density (e.g. 2012). The (static) effect of relative maturity (CRM) on β can be interpreted as differences in aboveground biomass and harvest index, both greater for late-maturing hybrids with greater CRM scores. The positive effect of soil water holding capacity (also static) on the response may be attributed to a greater water storage to maintain enough supply during critical period to maintain growth rates.

Model predictive performance
The set of model diagnostics indicate that the model performs best at yields greater than 7 Mg ha −1 [ figure 3(a)] and fails more often for lower yields and extreme cases [ figure 3(b)]. The greatest biases were observed for non-matching simulated and observed weather indicators [figure 3(a)], with worse predictive metrics. For example, R 2 increased from 0.26 to 0.32 when weather simulations and observations did not match, compared to when they did (table 1). Considering each predictive posterior, 97.5% of the points were contained inside their 95% credible interval. Furthermore, performance metrics of point estimates improved when analyzing only the draws from the posteriors when the simulated weather indicators matched the observed situation (i.e. above/below normal values).
The EOPD forecasts were a function of the crop and environmental factors that define the responsiveness (β), where higher values of the parameter correspond to greater EOPD. Better growing conditions will then correspond to greater EOPD for a given site ( figure 4). The EOPD forecasts are inevitably bound to uncertainty from (i) the estimation of the parameters, (ii) the subseasonal predictions, and (iii) observing new data. Following this rationale, forecasts with a probability of above-normal greater than 1/3 would shift the EOPD posteriors towards greater values. In addition, the forecast uncertainty was also conditioned by differential certainty about the effects of different conditions. One of the most contrasting examples can be portrayed with temperature anomalies in the second month after planting: above normal values presented a standard error of the effect on β of 1.92 10 −5 , while below normal temperatures showed less uncertainty, namely 1.66 10 −5 . Translated to EOPD values, a change in the indicator Table 1. Relevant metrics for predictive performance. Percentage bias error (PBE, indicating differences between means of predictions and observations), percentage additive bias (PAB, related to systematic additive issues on the predictions), relative efficiency of the model (Erel), Ji and Gallo's agreement coefficient (AC, that contemplates both accuracy and precision), root mean squared error (RMSE) and coefficient of determination (R 2 ) for the all the draws (General), the ones where simulated and observed weathers indicators did not match (No match), and where they did (match  from above-normal to below normal implies reducing the interquartile range from 1.9 Mg ha −1 to 0.74 Mg ha −1 , ceteris paribus.

Discussion
This study introduces a novel framework leveraging climate predictions into an agronomic management recommendation model to assess risk. The probabilistic nature of this product may assist management decisions before planting and, even more importantly, trigger discussions about management strategies for specific GxE scenarios that better understand the associated risk [23]. We merged a model grounded on biophysical insights with a set of crop, environmental and management data to model the yield-density response. Our approach could be generalized to other production inputs such as nutrients and water. This is a novel use of Bayesian shrinkage model to predict and interpret the influence of weather on the yield-density relationship.
The predictive performance of a model describing the yield-density relationship was low and expected to be so, because of the uncertainty in weather conditions defining the yield-density response.
Other approaches to early seasonal yield forecasting report poor performance metrics, like R 2 with values skewed towards zero. Forecasts considering more information about the growing season show a much improved performance [53]. In addition to the challenge of an early prediction, maize presents a formidable challenge for yield predictions under rainfed conditions: maize is sensitive to water deficit and has limited capacity to compensate kernel loss with increased kernel weights [54][55][56]. The integration of climate forecast uncertainty to estimation uncertainty reduces the precision of the predictions and forecasts of the yield-density relationship, but provides more realistic prospects [57]. The inclusion of climate forecasts into yield forecasts is rarely discussed and reported in agricultural applications, possibly leading to over-optimistic uncertainty estimates. Interval or distributional forecasts [58] are used in some fields such as meteorology and ecology [58,59], but are still rare in agronomy. Nonetheless, forecast accuracy is a component of an honest prediction about the future.
The set of most important variables highlights the seasonality of yield-density relationships and thus, the necessity to redesign current ex-post analyses to manage recommendations with frameworks including dynamic variables. The effect of a static factor like CRM on management effects was consistent with the literature, where short-cycle hybrids are less responsive to plant density [31] [ figure 2(b)]. The decisive role of the dynamic factors affecting the yield-density relationship was also consistent with previous documentation, describing weather effects on the response of maize yield to inputs like nitrogen [60] and plant density [27,32]. Notoriously, our study suggests anomalies in precipitation had a lower effect on responsiveness than anomalies in temperature, which may be related to greater development rates and reduced intercepted radiation [26,28]. The most influential weather variables were temperature and precipitation anomalies. Integrating dynamic variables into recommendation models is crucial, in order to account for that source of uncertainty in the response to inputs, from agronomic [61,62], environmental [62], and economic viewpoints [63].
The probabilistic approach presented in this study may assist farmers in their early management plans with a grasp of the risk associated to their decisions. One could argue that most farmers tend to think Bayesian when interpreting weather forecast data [13,64,65]. Farmers can only hypothesize what the next growing season will entail, yet they need to make management decisions well in advance, facing climatic and socioeconomic risks [66]. Their expectations of yields are formed by a combination of their prior knowledge of field history and characteristics, and seasonal weather prospects. Management practices are adjusted categorically [12], e.g. selecting pre-defined 'low' or 'high' plant densities for a site. This approach is limited by not accounting for weather uncertainty and the GxExM interactions. The use of categorical recommendations (i.e. low or high) may stem partially from the categorization of ENSO phases that result in categorical expectations of yield and consequently crop management [25,67]. Probabilistic models can act as triggers for conversations among stakeholders on the possible scenarios and their associated risks. Moreover, new generations of informed data-driven approaches integrate mathematical and statistical models to mimick how managers and farmers deal with uncertainties [68]. Future forecast models must support the design of effective GxExM adaptation strategies to enhance advances in crop improvement technologies, agriculture resilience, economic viability, and climate change mitigation [69] and provide insights for agronomic interpretations [70]. The shift of paradigm towards knowledge-based decisions requires building solid decision support systems that merge biological theory with large data and offer probabilistic inference, but also agents with effective communication skills to provide pragmatically interpretations of the outcomes [70,71].
The present study portrays a framework to determine optimum management practices with a case study on plant density determination. Future studies may expand the framework to increase its precision and applicability. Some models worth exploring may include correlation matrices of precipitation and temperature [72], more complex relationships between yield and weather variables, yield and other inputs and their interactions (e.g. density by nitrogen), and interactions the before mentioned variables with genotypes. At the same time, models that are currently unable to predict some extreme events [20] may improve overall performance in the future [73], whereas products with higher temporal resolution may expand the possible applications (e.g. irrigation management) and reduce some estimation uncertainty. The current study focused on the central region of the US, assuming homogeneous effects of weather across sites. However, latitude may be the driver of different effects on the GxE scenarios. One simplification of the present study lies in the use of aggregated county data for forecasts, meaning information loss of spatial heterogeneity. Higher resolution of weather and soil data may be an easily attainable task to improve model predictive performance, especially in terms of precision. Future inclusion of GxE interactions to the model can account for better tailoring genotypes and management practices to expected climates [33]. New artificial intelligence models and correlation structures may even unriddle yet undiscovered aspects on the GxExM interaction [74]. Last, future frameworks should also consider the complexity of multi-objective optimization approaches in a Bayesian stochastic environment, considering the objectives of farmers, market and outcomes uncertainties (e.g. expected cost and price distributions), and farmer risk behavior [75,76]. Such decision support systems will help end users make better decisions by being able to assess potential outcomes and agronomic, economic and social tradeoffs more accurately in uncertain environments [77].

Conclusion
We presented a novel framework that leverages climate predictions into probabilistic forecasts of yield response to management. This specific study provides seasonal forecasts of EOPD, merging a biophysical informed model with a set of crop and environmental data (observations and forecasts) within a Bayesian shrinkage model. The most relevant information for describing the yield-density relationship was hybrid relative maturity (CRM) and most importantly, the dynamic variables that indicate anomalies in precipitation and temperature. The introduction of this framework contributes to shift the paradigm from using static to dynamic variables, and towards the adoption of probability-based recommendation models. Future studies should address GxE interactions, multi-objective optimization approaches, to contribute to research goals ranging from breeding to agronomy for developing specific GxExM strategies to fit our current complex agricultural farming systems.

Data availability statement
The data cannot be made publicly available upon publication because they are owned by a third party and the terms of use prevent public distribution.