Hybrid modeling of evapotranspiration: inferring stomatal and aerodynamic resistances using combined physics-based and machine learning

The process of evapotranspiration transfers liquid water from vegetation and soil surfaces to the atmosphere, the so-called latent heat flux ( QLE ), and modulates the Earth’s energy, water, and carbon cycle. Vegetation controls QLE by regulating leaf stomata opening (surface resistance rs in the Big Leaf approach) and by altering surface roughness (aerodynamic resistance ra ). Estimating rs and ra across different vegetation types is a key challenge in predicting QLE . We propose a hybrid approach that combines mechanistic modeling and machine learning for modeling QLE . The hybrid model combines a feed-forward neural network which estimates the resistances from observations as intermediate variables and a mechanistic model in an end-to-end setting. In the hybrid modeling setup, we make use of the Penman–Monteith equation in conjunction with multi-year flux measurements across different forest and grassland sites from the FLUXNET database. This hybrid model setup is successful in predicting QLE , however, this approach leads to equifinal solutions in terms of estimated physical parameters. We follow two different strategies to constrain the hybrid model and therefore control for the equifinality that arises when the two resistances are estimated simultaneously. One strategy is to impose an a priori constraint on ra based on mechanistic assumptions (theory-driven strategy), while the other strategy makes use of more observational data and adds a constraint in predicting ra through multi-task learning of both latent and sensible heat flux ( QH ; data-driven strategy) together. Our results show that all hybrid models predict the target variables with a high degree of success, with R2 = 0.82–0.89 for grasslands and R2 = 0.70–0.80 for forest sites at the mean diurnal scale. The predicted rs and ra show strong physical consistency across the two regularized hybrid models, but are physically implausible in the under-constrained hybrid model. The hybrid models are robust in reproducing consistent results for energy fluxes and resistances across different scales (diurnal, seasonal, and interannual), reflecting their ability to learn the physical dependence of the target variables on the meteorological inputs. As a next step, we propose to test these heavily observation-informed parameterizations derived through hybrid modeling as a substitute for ad hoc formulations in Earth system models.


Introduction
Evapotranspiration, i.e. surface latent heat flux (Q LE ), plays a key role in driving Earth's energy, water, and carbon cycles, and is primarily controlled by dynamic meteorological conditions and soil water conditions, as well as static properties such as soil characteristics and plant traits (Jung et al 2010, Dou andYang 2018, Ajami 2021). Plants critically influence Q LE mainly through their direct control of transpiration, but also through shaping aerodynamic surface properties (i.e. roughness). They use their leaf stomata to dynamically regulate the water loss to the atmosphere, which depends not only on atmospheric water demand, but also on soil water availability (Damour et al 2010, Kennedy et al 2019, Carminati and Javaux 2020. While the physical drivers that cause water to evaporate are well described and understood, the influence of biological control on Q LE , mainly the transpirative water flux, is more difficult to assess. As a consequence, empirical formulations, especially for surface (r s ) and aerodynamic resistance (r a ), remain used in process-based models, which can lead to large uncertainties in predicting Q LE (Polhamus et al 2013). Most formulations of r s are empirical or rely on optimality concepts, such as minimizing water loss while maximizing carbon assimilation (e.g. Tan et al 2021). As such, these concepts do not take into account the active transpiration mechanism that some plants use to down-regulate leaf temperature through evaporative cooling to prevent leaf overheating at high irradiance and air temperature (Lin et al 2017, Drake et al 2018. Other empirical approaches, e.g. the Jarvis-Stewart formulation, Ball-Berry model, and Leuning model aim to derive parametrizations based on statistical correlations between r s (or canopy resistance) and the key environmental variables (Jarvis 1976, Stewart 1988, Leuning et al 1991, Leuning 1995.
These ad hoc formulations have several drawbacks, e.g. they are considered too rigid, especially when evaluated in a coupled system of atmospherebiosphere feedbacks where some of the environmental variables are actually also a function of r s (Ronda et al 2001). Overall, these empirical representations for r s and r a in deterministic models for Q LE obey physical laws and phenomenological behavior (Krasnopolsky 2013, de Bezenac et al 2017. Yet, they exhibit limited capability to adapt to other or changing vegetation composition or long-term climatic conditions, especially with respect to soil moisture (Damour et al 2010, Medlyn et al 2011, Kennedy et al 2019.
Statistical models have been proposed as alternative approaches to reliably estimate Q LE due to their data-adaptiveness (Tramontana et al 2016, Dou and Yang 2018, Carter and Liang 2019. In particular, approaches that use machine learning (ML) techniques are gaining traction because they can implicitly learn unknown latent processes and constitute a more complete statistical representation of the processes that influence Q LE at different scales in space and time (Jung et al 2009, 2020, Dou and Yang 2018. However, these data-driven models have several drawbacks, such as the need for large amounts of high-quality data, their limited physical consistency, and their lack of mechanistic interpretability (Karpatne et al 2017a(Karpatne et al , 2017b. Accordingly, physics-based models are restricted by the ad hoc assumptions of the system, and ML models are limited by their inability to produce physically interpretable and consistent predictions. Therefore, the combination of mechanistic and ML modeling promises physically interpretable performance of predicting and inferring intermediate (or latent) states and variables by merging the advantages of the causal understanding of physicsbased models and the predictive power of ML. Different approaches have been proposed to circumvent the issues originating from using pure physics-based and ML models. They combine the complementary strengths of both techniques, which enables ML models to capture dynamic patterns and improve the accuracy and physical interpretability of predictions. These methods include a form of physics-guided ML techniques, where the neural network (NN) is constrained by different means to produce predictions that mirror realistic climate conditions and fluxes. The physics-guided ML approaches can be generally subdivided into physics-guided loss functions, initialization, architecture design, and hybrid modeling (Karpatne et al 2017a, Jia et al 2020a, Willard et al 2020. The combination of ML and mechanistic modeling, here denoted hybrid modeling, makes it possible to combine the strengths of both techniques: ensure physical consistency while efficiently harvesting the growing resource of observational data . Therefore, the synergy of both techniques offers promising solutions to the shortcomings encountered in using both techniques separately. Several studies have successfully applied hybrid modeling in hydrological applications, such as the characterization of the different known and unknown variables governing the global water cycle (Kraft et al 2020, simulation of lake temperature dynamics (Jia et al 2020b), and the modeling of global extreme flooding events (Yang et al 2019). Other studies focusing on land-atmosphere interactions of ecosystem fluxes, such as Q LE (Zhao et al 2019), showed that these hybrid approaches allow for better extrapolation and generalization capabilities during extreme conditions.
In this study, we propose a hybrid modeling approach that allows the inference of these biophysical controls based on observational data of Q LE across ecosystems, while adhering to known physical laws . The aim of this study is to offer guidance on how to infer the hidden controls of landatmosphere coupling from observational data using hybrid learning rather than from ad hoc assumptions with rigid parametrizations. The hybrid modeling approach illustrates the ability to provide physically interpretable and accurate predictions against observations at different temporal scales and ecosystems.
The obtained observation-informed parametrizations for r s and r a reveal variability across different vegetation canopy structures, which is unaccounted for by conventional parameterizations.
In the methods in sections 2.1 and 2.2, we describe the hybrid modeling approach and introduce different models of Q LE using the Penman-Monteith (PM) equation (Penman 1948, Monteith 1965) and eddy covariance (EC) flux measurements from several grassland and forest sites (Baldocchi et al 2001, Li et al 2018. Our hybrid models should not only yield accurate predictions of Q LE , but also enable us to better understand the functioning and influence of biophysical processes on Q LE expressed through the surface and aerodynamic resistances. We present and explore the problem of equifinality in our setting (section 2.3.2) (i.e. different combinations of r a and r s may result in the same Q LE ) and propose two conceptually different solutions (theory-versus datadriven) to this problem (section 2.3.3). We evaluate the predictions of our hybrid models for Q LE , r a and r s against purely statistical models as well as against established mechanistic models in section 3.

Methodology
In this section we describe the data pre-processing methods and different model setups used. Section 2.1 describes the data and processing. Section 2.2 defines the physics-based component of the hybrid model, and section 2.3 provides an overview of all models.

FLUXNET 2015 data
The flux network (FLUXNET; https://fluxnet.org), a global network of EC towers, provides estimates of energy, water, and carbon fluxes at the land surfaces across climate regimes and plant functional types (Baldocchi et al 2001, Li et al 2018. The measurements in the FLUXNET 2015 Tier 1 dataset are resolved at a half-hourly frequency. Following Reichstein et al (2005), we select only measured data and omit gap-filled data. Further, we restrict our analysis to energy-balance-corrected measurements, because the EC data do not satisfy the energy balance budget closure, which potentially introduces high uncertainty and systematic bias in our results (Wilson et al 2002). Daytime values are selected based on a threshold of sensible heat flux Q H > 5 Wm −2 and incoming short-wave radiation SW in > 50 Wm −2 to avoid stable boundary layer conditions following Lin et al (2018) and Li et al (2019). Only positive values are selected for the latent heat flux (Q LE ), net radiation (R n ), soil heat flux (Q G ), and vapor pressure deficit (VPD) for daylight data according to Zhou et al (2016). Winter months between October and March are excluded to focus on surface heat fluxes when the vegetation is active (Zhao et al 2019). The FLUXNET sites chosen include three forest and three grassland sites with varying climatic conditions and site characteristics (see table 1 in supplementary information).

The physically-based component: PM equation
Various process-based models exist for the estimation of Q LE . They can be subdivided into energy, mass transfer-based methods, water balance methods, and aerodynamic methods (Brutsaert 2005, Zhao et al 2013. One prominent example is the PM equation (Penman 1948, Monteith 1965) that provides the theoretical basis for determining Q LE and its response to changing climate and vegetation conditions (Monteith and Unsworth 2013). The estimation of Q LE can be traced back to the model proposed by Penman (1948), which combines the energy balance and mass transfer approaches to estimate evaporation from open water surfaces. The model was later extended to vegetative surfaces (Monteith 1985, Monteith and Unsworth 2013, Vialet-Chabrand and Lawson 2019. The PM equation describes the latent heat flux Q LE (Wm −2 ), where R n and Q G are measured in (Wm −2 ), r s and r a are estimated in (sm −1 ), s c is the slope of the saturation vapor pressure-temperature relationship (kPa C −1 ), e s − e a is the VPD of air (kPa), ρ a is the mean air density at constant pressure (kg m −3 ), c p is the specific heat of dry air at constant pressure (1004.834 Jkg −1 C −1 ), and γ is the psychrometric constant (kPa C −1 ).

Overview of models
The following subsections present the different models used, which differ in their approach towards being more data-or theory-driven. Each subsection describes in detail the structure of and differences between the models. All models were randomly initialized and drawn from a uniform distribution.

Inverted PM and pure ML model
The PM equation is considered to be physics-based, since core physiological and aerodynamic factors describe the evaporative process (Jain et al 2008). The equation highlights the relationship between evapotranspiration and surface conductance, which is regulated by the leaf stomata to minimize the water loss to the atmosphere (Hetherington and Woodward 2003, Damour et al 2010, Gerosa et al 2012. Different approaches model surface conductance at the leaf level with various success. The determination of surface conductance at the canopy scale, however, is even more challenging due to canopy heterogeneity and variability in microclimate within the canopy (Bonan et al 2011, Lin et al 2018. A common approach is to invert the PM equation for r s to obtain the bulk surface resistance and understand its variations assuming that aerodynamic resistance r a is known; a strong assumption as we will revisit later. The inverted PM equation (PM Inv) is used to quantify canopy parameters and expresses the relative significance of advective and radiative energy for Q LE as a function of the ratio of surface to aerodynamic resistance , Zeppel and Eamus 2008.
The inversion of the PM equation, leads to highly unstable estimates of the resistances. Therefore, we restrict surface and aerodynamic resistance values derived using PM inversion and empirical formulations (Knauer et al 2018) based on intervals that are physically realistic (0-2000 sm −1 and 0-500 sm −1 , respectively).
The estimates for r s from equation (2) derived through inverting the PM equation are referred to here as the PM Inv model. Values for r a are estimated using the Big Leaf formulation from Knauer et al (2018), which calculates r a as the sum of aerodynamic resistance for momentum (r am ) and canopy boundary layer resistance for heat (r bh ) and where WS is wind speed (ms −1 ) and U * is friction velocity (ms −1 ). The PM Inv model represents a baseline physical model for comparison against pure data-driven models for Q LE . The pure ML model for Q LE is set up to evaluate predictions against hybrid models. The pure ML model consists of a feedforward NN (FNN) (figure 1), and details about the hyperparameters of the model are found in table 2 of the supplementary information. The r s is calculated from Q LE predictions from the pure ML model by using PM Inv, and r a is estimated using the ad hoc formulation (equation (5)) approach. This model is purely data-driven and does not contain any physical constraint regarding Q LE .

Under-constrained hybrid model
The hybrid model estimates Q LE using the PM equation (equation (1)), where the two intermediate variables r s and r a are estimated by two FNNs (figure 1). The variables used for predicting r s are air temperature (TA), water availability index (WAI), incoming shortwave radiation (SW in ), mean incoming shortwave potential (SW pot sm ), VPD, and R n . The WAI is calculated as the annual cumulative difference between Q LE and precipitation (P). The WAI at time t (WAI t ) is calculated from the difference between Q LEt and P t added to WAI at the previous time step The variables for predicting r a are WS and U * . The input variables chosen for the latent variables r s and r a were selected based on variables included in the Big Leaf and PM equations and physical intuition and interpretability through manual tuning of parameters. The predictors are normalized using the mean and standard deviation of the training dataset. Thus, the hybrid model first predicts the intermediate (or latent) variables r s and r a and uses them to estimate Q LE based on the PM equation. The hybrid model predicts Q LE that exist between the initial input and resulting output phases in one step (Reimers and Requena-Mesa 2020). The loss function minimizes the difference between predicted and observed Q LE and is defined as the mean absolute difference between the model predictions and observations with n sample size, and parameters θ for r s and r a min θr a ,θr s We use the mean absolute error as it is less sensitive to outliers than the mean squared error. The pure ML model and the hybrid model both optimize against Q LE as highlighted in the loss function (equation (7)), however, the hybrid model estimates r s and r a as intermediate variables and uses the PM equation to estimate Q LE . While the pure ML model directly predicts Q LE without the physical constraints imposed by the PM equation.
Although the two FNNs for r a and r s take different predictor variables, the hybrid model is underconstrained when simultaneously estimating the two intermediate variables using only one target Q LE . The proposed hybrid model thus suffers from an equifinality problem. The issue of equifinality, or non-uniqueness, occurs when different model parametrization or structures result in equivalent representations of the system (Beven 2006, Schmidt et al 2020. Thus, many different combinations of r s and r a can result in the same Q LE value (figure 2).

Constrained hybrid models: a priori and multi-task learning models
The identification and elimination of equifinality, in the physics-based component is one of the key challenges in hybrid modeling . One way to reduce equifinality is to restrict the parameter space Figure 1. Schematic overview and classification of all models with respect to being more theory-and/or data-driven, as well as the strengths of the constraints on the loss function. The color coding represents the distinct and individual NN used for the latent and target variables. The pure ML model consists of an FNN to predict QLE with no physical constraints. The hybrid models consist of two individual FNNs, which estimate rs and ra separately with independent input climate variables. These latent variables are used in the Penman-Monteith equation to estimate QLE, and differ based on the regularization method used in the loss function. The unconstrained hybrid model suffers from equifinality. The a priori hybrid model is more strongly constrained with a weighted ra from the Big Leaf model and is more theory-driven. The relative importance of ra in the loss which reflects the influence of Big Leaf theory on the a priori model is regulated by ϕ. Based on multiple model runs, the ϕ value is selected to have a minor influence of prior knowledge in the loss function. The multi-task learning model is constrained with more information from learning an additional observation QH and is more data-driven. WS is wind speed (ms −1 ), and U * is friction velocity (ms −1 ). Rn is the net radiation (Wm −2 ) VPD, is the vapor pressure deficit of air (kPa), WAI is the water availability index calculated in equation (6), TA is air temperature ( • C), SW in is incoming shortwave radiation (Wm −2 ), and SWpot sm is mean incoming shortwave potential (Wm −2 ). through model regularization. This can be achieved through two approaches; by including either additional theory or data via additional loss terms. The integration of a priori knowledge in the loss function (i.e. a regularization) induces an a priori constraint on r a in the hybrid model (figure 1) based on the empirical formulation presented in equation (5), as the formulation for r a is considered to be more robust than for r s . To do so we regularize the loss function by adding a constraint on the loss minimizing aerodynamic resistance Loss (r a , r a )/ϕ. The relative importance of r a in the new loss is regulated by ϕ, which is varied between the high influence and low influence of the constraint. Based on multiple model runs with varying values for ϕ, we select a value for ϕ to only impose a low influence in the overall loss function.
Another way of restricting the parameter space is by extending the framework to model auxiliary target variables, whereby the auxiliary tasks help to regularize the problem objective (Liebel and Körner 2018). Since the sensible heat flux (Q H ) is also dependent on the aerodynamic resistance r a , we explore a multitask learning approach by restricting the parameter space through modeling Q H and Q LE simultaneously (figure 1). The estimation of Q H is based on the resistance formulation where T S and T A are surface and air temperature respectively. The T S is estimated using the Stefan-Boltzmann equation where Q LWout is the outgoing longwave radiation (Wm −2 ), σ is the Stefan-Boltzmann constant (5.789 × 10 −8 Wm −2 K −4 ) and ϵ is emissivity (dimensionless). The emissivity ranges from 0 to 1, and the values chosen were based on selecting models with the highest predictive accuracy.

Evaluation
We evaluate four models,i.e. one pure ML model, one under-constrained hybrid model (i.e. with no strategy to decouple r a and r s ), and two constrained hybrid models. The constrained hybrid models either consist of an a priori constraint on r a or use a multitask learning approach.

Evaluation of the learned latent variables r s and r a
We evaluate the impact of the Q LE -controlling resistances r s and r a which are treated as intermediate variables in our hybrid approach. Note that the models are driven by all relevant predictor variables for r s and r a , respectively (cf figure 1), and only for evaluation and interpretability puproses do we plot predictions against individual predictor variables. Based on the strongest dependencies discovered by our methods, we first plot the inferred estimates of r s and r a against the key meteorological drivers, namely VPD and the frictional velocity U * , respectively (figures 3 and 4). The behavior of r s against VPD is consistent across all the models and reflects a similar behavior as presented forQ LE ( figure 5). The predicted r s shows a subtle increase at lower ranges of VPD, reflecting that stomata are still open for gas exchange with the atmosphere. However, as VPD increases, the stomata start to close and thus surface resistance increases sharply (Massmann et al 2019). Further, we find that r s is generally lower for grasslands, which explains the generally higher estimates of Q LE in comparison to forests (figure 5). Another striking finding is that the models seem to be able to identify differences in the physiological functioning across different plant types in controlling r s . For instance, the inferred relationship of r s  . Assessing latent variables rs and ra against VPD and U * respectively for different models in grasslands. The constrained hybrid models yield more physically consistent results compared to under-constrained model and are able to capture the vegetation and climate heterogeneities. The colored lines represent the smoothed lines that fit a polynomial surface using local fitting for each site. The contour lines represent 2D kernel density estimate. and VPD is very similar for the two forest sites DE-Tha and FR-LBr, which are dominated by evergreen needle-leaf trees, but it is quite different for the more arid site FR-Pue, which is dominated by evergreen broad-leaf trees (figures 3(a)-(c)). There, the hybrid models show that on average r s rises more steeply with increasing VPD but flattens out at very high VPD (compare fit lines in figures 3(a)-(c)). Future research is needed to determine whether this behavior actually reflects the plants' mechanism for preventing leaf overheating by maintaining some evaporative cooling through the stomata (Lin et al 2017), or whether it is just an artifact of too sparse data at high VPD. Overall, the inferred r s through hybrid modeling (figures 3(a)-(c)) is much more precise than its conventional derivation by inverting the PM equation while making assumptions for r a ( figure 3(d)). This aspect constitutes a key advantage of our hybrid approach as opposed to the inversion method, where artificial noise in the flux measurements directly propagates into the inverted estimates of r s resulting in high artificial variability and a bias in r s ranging from 0% to 30% (Wehr and Saleska 2021). The inferred relationship for r a against its key driver U * is not consistent across the hybrid models. The two constrained hybrid models, i.e. multitask learning (figure 3(f)) and a priori constraint (figure 3(g)), consistently reflect the expected negative logarithmic relationship of r a against U * (figures 3 and 4). In particular, in the case of the hybrid multitasking model, this result is promising because the relationship emerges from the observational data alone, without inducing any prespecified knowledge. Furthermore, the two constrained hybrid models show variations of the r a relationship across the sites (figures 3(f), (g) and 4(f), (g)). Thus, they are capable Figure 5. Evaluating QLE predictions against VPD for different models for forests (a)-(d) and grasslands (e)-(h). Higher evapotranspiration rates are evident for grasslands compared to forests associated with higher stomatal conductance. The colored lines represent the smoothed lines that fit a polynomial surface using local fitting for each site. The contour lines represent 2D kernel density estimate. The under-constrained hybrid model (figure 3(e)), however, illustrates the risk of equifinality and the physics-violating behavior of this approach. In other words, r a exhibits physically inconsistent relationships in the under-constrained model across the sites ( figure 3(e)), while the predicted r s andQ LE retain physically plausible estimates (figures 3(a) and 5(g)-(i), respectively). The issue of equifinality is more prominent in forests than in grasslands, likely because aerodynamic resistance is less dominant in controlling Q LE in forests (figures 3(e) and 4(e); Chen and Liu 2020).
Aerodynamic resistance r a constitutes a critical link in the surface energy balance, especially under different environmental and stability conditions, as it has a bearing on both, Q LE and Q H . Uncertainties in Q LE and Q H mainly arise from the uncertainty in estimating r a for both dense and sparse canopy, and particularly under arid and semi-arid conditions (Trebs et al 2021). Our multi-task learning hybrid model, however, is able to provide fairly high accuracy for Q LE and Q H predictions for grasslands under unstable and semi-arid conditions without overestimating r a , which has been proven difficult in other modeling efforts (Trebs et al 2021). For example, the predictions for Q LE (figure 5) and Q H (figures 6(c) and (d)) at the US-Var grassland site, characterized by a dry Mediterranean-type climate Baldocchi 2004, De Kauwe et al 2017), are fairly accurate and relate to physically consistent r a predictions.
To get an estimate of the structural (epistemic) uncertainty for the inferred relationships for r s and r a , we train each model five times with random initializations (refer to section 2.3). The hybrid models show consistent predictions for the relationships for r s and r a at mean diurnal scale across the model runs with different initializations. The under-constrained hybrid model is consistent in producing physically non-interpretable r a for all initializations. The constrained hybrid models, on the other hand, are able to consistently reproduce the physically plausible relationships for r s and r a , especially at forest sites. Hence, our hybrid modeling approach yields robust predictions, yet we stress the caveats related to equifinality in these under-constrained model setups.

Evaluation of the target variablesQ LE and Q H
We evaluate the predicted Q LE (Q LE ) from all the hybrid models and the pure ML model against observed Q LE (Q LEobs ) at a half-hourly scale and at 7 d mean aggregates (mean diurnal) for forest (figure 7) and grassland (figure 8) sites. All models produce similar Q LE patterns compared to observations with minor differences in performance. For forests (figure 7), the more flexible models, i.e. the under-constrained hybrid model and pure ML model, perform slightly better (R 2 = 0.49) than do the multi-task learning model (R 2 = 0.48) and the a priori constraint model (R 2 = 0.46). For grasslands, the performance of all models is generally better than for forests. We find that the performance of the multitask learning model exceeds the performance of the a priori constraint model and is similar to the pure ML model (R 2 = 0.74-0.75) (figure 8). This finding could indicate that our theory-based constraint for r a might be too rigid and is not supported by the flux observations. Overall, the RMSE ranges from 70 to 73 Wm −2 for forests and 60-71 Wm −2 for grasslands at a half-hourly scale for all models. The MAE at halfhourly measurements range from 50 to 53 Wm −2 for forests and from 43 to 48 Wm −2 for grasslands for all models. The multi-task learning model provides predictions for Q H (Q H ) (figure 6) of similar accuracy compared to the Q LE predictions for all sites (figures 7 and 8), reaching R 2 = 0.53 for forests and R 2 = 0.68 for grasslands sites at a half-hourly scale. Overall, the pure ML model slightly outperforms the hybrid models for both forest and grassland sites, owing to its flexibility and non-parametric attributes as it only minimizes the loss of Q LE , without being constrained by the PM equation. More information on the physical interpretability ofQ LE assessed against meteorological variables can be found in the Suppl. Info. Sec. 3.3.
We evaluate the hybrid models' consistency with respect to the interannual variability of Q LE for the different sites. The interannual anomalies are calculated as the difference between the average annual estimates of Q LEobs in the training dataset and the annual estimates of Q LEobs andQ LE in the validation and test dataset for the EC data and models, respectively, to evaluate the predictive capacity of the different models (Jung et al 2009, Besnard et al 2019. Figures 7 and 8 show the overall fit and performance of the models in predicting interannual anomalies of Q LE compared to observed anomalies of Q LEobs . The values of R 2 range between 0.47 and 0.49 for the inter-annualQ LE anomalies for forests and thus exhibit a comparable performance at half-hourly frequency (R 2 ranges between 0.46 and 0.49) (figure 7). We observe a similar behavior at grassland sites: R 2 ranges between 0.65 and 0.75 at the half-hourly scale and between 0.62 and 0.74 for the interannual Q LE anomalies ( figure 8). Overall, the evaluation of the models at multiple temporal scales shows that the models are capable of learning not only the predominant structure of the diurnal and seasonal cycle, but also the more subtle year-to-year anomalies. The presented consistency reflects that the models learn the physically correct dependence of the meteorological predictor variables controlling Q LE (figure 5).

Conclusions
We present a new approach to end-to-end hybrid modeling of latent heat fluxes that can simultaneously retrieve the two controlling intermediate variablesthe surface (r s ) and aerodynamic resistance (r a )while maintaining physical consistency across different vegetation types. The hybrid models provide reliable predictions against measurements of latent heat fluxes at different time scales, ranging from daily to seasonal to interannual variations. This crossscale consistency shows that our model framework is able to learn the physically consistent dependencies between the meteorological input variables and the target fluxes, rather than just the dominant structure of diurnal and seasonal cycles.
The main novelty and outcome of this study are the data-driven parameterizations for r s and r a jointly estimated by two separate NNs, which can lead new insights on biophysical regulation of surface evaporation. We show that the NNs together can provide many solutions (non-uniqueness) and lead to physically plausible predictions for Q LE fluxes, while presenting physically implausible relationships to the predictors. This non-uniqueness can be mitigated by introducing either more data or theory into the loss function of the hybrid model. Specifically, we make use of two different approaches (a priori constraint and multi-task learning) to regularize the parameter space for the NNs. The constrained hybrid models in general yield accurate and physically interpretable predictions, with the multitask learning model estimating the target and latent variable predictions with accuracy similar to the pure ML model, but with constraints that respect the surface energy budget. Therefore, by incorporating additional observation-based information, the multitask learning model is the optimal hybrid model for the problem at hand. This architecture makes it possible to reduce equifinality and enables the model to extract underlying information from observations rather than ad hoc assumptions, while allowing the NN enough flexibility within the limits of physical interpretability of the surface energy balance.
When using the hybrid models to determine r s and r a , we find substantial differences between sites compared to the very uniform empirical formulations commonly used. This inter-site spread in the observation-based parameterizations suggests that the conventional empirical formulations are too rigid and do not account for the variability caused by the vegetation canopy structure. The hybrid models show differences among sites, highlighting in particular the different physiological functions of the different plant types, in comparison to the PM equation under the Big Leaf assumption. The resulting relationships for r s and r a not only show physically consistent behavior across scales, but also reveal new insights into how the varying resistances control surface energy fluxes. By evaluating the relationship of r s and r a to the driving meteorological variables, we are able to identify the effect of structural differences between forests and grasslands. The general response of stomatal conductance to VPD and photosynthesis in forest and grassland ecosystems is more aligned with the optimality theory as it considers the interactions between transpiration and carbon assimilation. However, grasslands tend to show a weaker dependence of stomatal conductance on photosynthesis and VPD that can be attributed to structural vegetation differences of the leaf area index and significantly larger weight and impact of r a on r s . r a is generally higher for grassland sites than for forests which is attributed to the surface roughness of leaves. r s is higher for forest sites compared to grasslands owing to the different atmospheric demand of the canopy and water uptake through roots that highlight the functional balance between shoots and roots under water-stressed conditions. In addition, we detect that these learned parameterizations in the hybrid models exhibit lower stomatal conductance, suggesting that the r s values usually obtained by inversion of the PM equation may be systematically overestimated.
Several approaches have already been proposed to use the growing number of observations to constrain uncertainty in mechanistic model simulations, especially for key unknown plant behavior in the coupled Earth system (Lian et al 2018, Winkler et al 2019a, 2019b, Varney et al 2020. As a next step, we propose to derive parameterizations directly from observations using hybrid modeling, as presented in this study, to replace these ad hoc formulations in Earth system models. This approach will not only help reduce uncertainty, but also advance significantly the understanding of biogeophysical and biogeochemical processes in land-atmosphere coupling.

Data availability statement
The data that support the findings of this study are openly available at the following URL/DOI: 10.5281/zenodo.7078500. Data will be available from 14 March 2023.

Code and data availability
All data used in this study are available from public databases or the literature, which can be found with the references provided in the respective 'Data and methods' subsection. Processed data and analysis scripts are available from the corresponding author upon request, and the repository will be published together with this article. Correspondence and requests for materials should be addressed to Reda ElGhawi (relghawi@bgc-jena.mpg.de).