Interchangeability of multi-decade skin and surface air temperature trends over land in models

Satellite land surface temperature (Ts ) records have now reached 20+ year length, but their trends may differ from historical records built from in-situ measurements of near-surface air temperature (Tas ). In the ERA5 reanalysis, 60° S–60° N land Ts and Tas trends can differ by up to ±0.06 °C decade−1 over 20 years, depending on the period, or more on smaller spatial scales. Here I use 1979–1998 outputs from ACCESS1-0 climate model simulations with prescribed land Ts to understand changes in Ts and Tas . CO2’s effective radiative forcing causes adjustments that warm Tas relative to Ts . In ACCESS1-0, vegetation enhances the adjustments to CO2 over land. Meanwhile, feedbacks in ACCESS1-0 oppose the adjustments, resulting in small long-term net effects on global temperature estimates. In coupled simulations from other models, there is no agreement on whether Ts or Tas warms more and the most extreme case shows global long-term differences of just 5% between land Ts or land Tas trends. The results contrast with over-ocean behavior where adjustments and feedbacks reinforce each other, and drive larger long-term Tas warming relative to Ts across all models.


Introduction
During 2023 European heatwaves, satellite retrievals of land surface temperature (LST or land T s ) exceeded 60 • C1 , far in excess of the European near-surface air temperature (T as ) record of 48.8 • C accepted by the World Meteorological Organisation in 2021 2 .The surface energy balance, in particular surface absorption of sunlight, establishes a surface temperature discontinuity δT s = T s − T as , with T s > T as on average and, in extreme situations such as heatwaves, by substantial amounts.However, δT s varies greatly and the long-term changes in over-land δT s in response to global warming have not been explored in detail.
By contrast, over-ocean δT s changes have been studied more since early Intergovernmental Panel on Climate Change (IPCC) reports generally considered global T as in models, while ocean data were of sea-surface temperatures.Since model T as consistently warms by more than T s , model-observation comparisons of warming were biased (Cowtan et al 2015).Accounting for sampling differences including T s versus T as usage greatly reduced an apparent model-observation divergence in warming amounts and estimates of transient climate response (Richardson et al 2016).
With the emergence of multi-decade satellite T s records, a similar issue may now arise over land.For example, the Atmospheric Infrared Sounder (AIRS, Chahine et al (2006)) was launched in 2002, and Susskind et al (2019) were able to 'confirm' global warming trends over 2003-2017 by comparing AIRS T s with T as from European Reanalysis-Interim (ERA-Interim) and in-situ products.Figure 1 shows that in the ERA5 reanalysis (Hersbach et al 2020), T as (figure 1(a)) and T s (figure 1(b)) trends are spatially similar during 2003-2022, but regional differences exceed ±0.2 • C dec −1 (figure 1(c)).For the Susskind et al period 2003-2017, the nonpolar (within ±60 • S/N) land T s trend, as would be retrieved by AIRS, was just 0.01 • C dec −1 more than for T as , but larger differences occurred during other periods, such as a difference of 0.06 • C dec −1 during 1985-2004 (figures 1(d)-(f)).Regional differences can be larger, such as in figure 1(c) where the Congo basin T as trend of +0.39 • C dec −1 is approximately 70% more than the T s trend of +0.23 Research that compares T as and T s changes may misinterpret results as evidence of disagreement between datasets or models and observations, even if δT s has changed for physical reasons.For example, studies on the effects of deforestation have compared model T as with satellite-retrieved T s , even though deforestation affects each of them differently (Chen and Dirmeyer 2019).Understanding δT s changes is important since δT s helps to control surface heat and moisture fluxes.In an attempt to better understand how and why T s and T as trends can differ, I address two related scientific questions: (i) How similar are climate model trends in T s and T as ?(ii) If the trends differ, can we determine how differences are driven by CO 2 forcing, long-term feedbacks and internal climate variability (ICV)?
Other work has considered present-day relationships between T s and T as (Vancutsem et al 2010, Nielsen-Englyst et al 2019) but present-day relationships between T s and T as may not apply to long-term warming, predominantly driven by CO 2 .The present paper contributes by studying the δT s response to CO 2 forced warming.I ignore satellite-specific issues, such as possible biases in surface emissivity or infrared sensor sampling being limited to non-overcast conditions (Prigent et al 2016, Prakash et al 2018, Xu et al 2023).
The present analysis is possible thanks to the prescribed land atmospheric model intercomparison project (PLAMIP) protocol, which so far has only been used by ACCESS1-0 (Ackerley andDommenget 2016, Ackerley et al 2018).As described in section 2.1, by prescribing land T s with increased CO 2 , PLAMIP permits calculation of the over-land response of T as to CO 2 forcing, henceforth the effective radiative forcing (ERF) adjustment.By contrast, standard AMIP simulations published for other models allow land T s to change, thus introducing a feedback component into the ERF calculation.
Over oceans, Richardson (2023) showed how ERF adjustments and feedbacks both reduced model δT s .CO 2 ERF adjustments include atmospheric warming with fixed T s , so δT s < 0 (Kamae et al 2015).The dominant net feedback was evaporative cooling of T s , which further reduces δT s .A land-specific investigation is warranted since Richardson (2023) used fully-coupled model simulations to diagnose over-ocean ERF adjustments using the method of Gregory et al (2004).The method reproduced over-ocean behavior in independent simulations, but failed to quantify over-land behavior.I will also report regional results since satellite data is often used to study otherwise data sparse regions (Aguilar-Lome et al 2019, Pepin et al 2019).
Recently, Kan et al (2023) reported that satellite T s warmed less than in-situ T as during summer daytime, but the opposite occurred at night.They attributed the results to vegetation increasing turbulent surface cooling during the day.PLAMIP includes simulations with and without vegetation response to CO 2 , to study related processes.To put the single-model PLAMIP results in context, I will refer to outputs from climate model large ensembles (LENS).The LENS outputs will highlight inter-model differences and plausible ranges of observed T s and T as changes due to ICV, representing a more realistic study than provided by the artificial constructs of PLAMIP simulations.
The paper is organised as follows, section 2 introduces the model outputs and analysis methods, section 3 the results, section 4 discusses and section 5 concludes.

Methods, data and interpretation of feedback results
The main fields used here are T s (ts in CMIP nomenclature) and T as (tas).All model outputs are annually averaged and re-gridded to 2.5 • in latitude-longitude.Nonpolar land and ocean averages are the area-weighted means between 60 • S-60 • N, scaled by land fraction in each 2.5 • grid cell derived from the 0.125 • ERA-Interim land-sea mask.

PLAMIP analysis
PLAMIP builds on the AMIP protocol, in which climate models are run with prescribed ocean T s and sea-ice coverage following historical time-varying data.Forcing and feedbacks are quantified from pairs of simulations.For example, the difference between a run with historical period-mean CO 2 (the 'baseline' simulation), and one with a higher CO 2 level is traditionally called the Hansen method ERF, Hansen et al (2005).Meanwhile, the difference between the baseline simulation and one with a uniform +4 • C warming of ocean T s is used to calculate feedbacks, which are typically reported per • C.
The IPCC defines ERF as changes in response to a forcing agent with ∆T s = 0 • C.However, in standard AMIP the freely-varying land T s typically warms in the 4xCO 2 simulation, biasing the derived ERF (Andrews et al 2021).In PLAMIP the land soil properties and T s in the 4xCO 2 simulation are prescribed to match that of the baseline simulation at 3 hour resolution, thus ensuring that ∆T s = 0 • C between the baseline and 4xCO 2 simulations.Additional PLAMIP simulations used here include following historical T s , but with no vegetation CO 2 response, with uniform +4 • C land and ocean T s, , and uniform ocean warming but with land allowed to respond freely (see supplementary table 1).The uniform +4 • C land-warming simulation uses the baseline soil moisture series, which could result in surface flux changes that are different from what would actually occur under warming of that magnitude.Furthermore, T s is not prescribed over ice, but the consequences should be minimal for our nonpolar land results.For PLAMIP the surface sensible heat (hfss) and latent heat (hfls) fluxes are also used to interpret vegetation effects.
The PLAMIP simulations are over 1979-2008, but only 1979-1998 are used here because output for one ensemble member is erroneous in the final decade (see erratum in Ackerley (2019)).
ERF adjustments and feedbacks are estimated from differences between the simulations listed in table 2 of section 3.4, and the baseline simulation.For example, the simulation with quadrupled ('4x') CO 2 but prescribed land ('pl') and ocean T s is labeled A4xpl, and subtracting the baseline simulation (Apl) returns the ERF adjustment for quadrupled CO 2 .The result is divided by 2 to obtain the canonical doubled-CO 2 forcing adjustments (approximating the 0.476 scaling adopted by the IPCC, see table 7.2 in Forster et al (2021)).An AMIP-like ERF is calculated using A4x minus Apl, where A4x has freely varying land T s so local land feedbacks also occur.For feedbacks, I calculate changes between simulations for every grid cell, labeled with a ∆ (e.g.∆δT s ), and then divide that by the grid-cell ∆T s .Results are similar when dividing by nonpolar land mean ∆T s (not shown).

LENS analysis
Later results will show that over 20 year periods, the effect of ICV on δT s can be similar to the forced response.To compare with PLAMIP results, which represent forced changes in δT s , I select models with at least 20 simulations, and average across them to isolate the forced response.I use the coupled model intercomparison project, phase 5 (CMIP5)-era large ensemble (LENS) runs from Richardson (2022), which is all CMIP5 models with ⩾20 runs that provide both ts and tas.The simulations are fully coupled, with freely varying T s and T as , and forcing following the CMIP5 historical scenario from 1970-2005 and the representative concentration pathway 8.5 (RCP8.5)for 2006-2100.The models used are CESM1-CAM5, CanESM2, MPI-ESM, CSIRO-Mk3.6.0,GFDL-CM3 and GFDL-ESM2M, all taken from the multi-model large ensemble archive (Deser et al 2020).The CMIP5 collection contains the largest number of models with ⩾20 runs following the same scenarios.A test with a CMIP6 large ensemble from CESM2 (Rodgers

Region name
Latitude-longitude box shows results within the CMIP5 range, so conclusions are not expected to change greatly with model generation (supplementary figure 1).
For each run its initial decade mean is subtracted to obtain anomalies ∆T s and ∆δT s .The ensemble mean in each year is the estimated forced change, and each run's differences from the ensemble mean are taken to represent ICV.In addition to trends, changes in forced and ICV-related ∆δT s will be regressed against forced and ICV-related ∆T s to determine whether there are differences between long-and short-term feedback responses.
The final part of the LENS analysis is to calculate 20 year trends for each run in the selected regions listed in table 1.The mean and standard deviation of each ensemble's δT s trends will be reported, identifying regions that models suggest will be variable or where models disagree.

AMIP versus PLAMIP adustments to CO 2 ERF
In figure 2

Vegetation effects on ERF
All else being equal, one Watt of sensible heat (H) will warm T as by more than one Watt of latent heat (L), since H represents heat transfer into the near-surface layer and directly warms T as .Meanwhile, latent heat is released via condensation higher up in the atmosphere, and potentially after horizontal transport, so the warming effect on T as is generally smaller.Plants can respond to higher CO 2 levels via stomatal closure and reduced evapotranspiration, causing a relative shift from L to H (Drake et al 1997, Keenan et al 2013, Park et al 2020).Under higher CO 2 figure 3 shows that ACCESS1-0 with interactive vegetation has flux responses consistent with stomatal closure in response to higher CO 2 levels (figure 3(c), note that figure 3(a) is the same as figure 2(b)).The simulation in which vegetation cannot respond does not show the same evapotranspiration-driven responses.With the exception of parts of India and the north of Eurasia, the strongest vegetation-driven response occurs in highly vegetated areas such as the Amazon and Congo basins.The shift to sensible heat generally strengthens the negative adjustment in δT s (figure 3(g)).

AMIP versus PLAMIP feedbacks
Local δT s feedbacks are displayed in figure 4 for three simulations: PLAMIP +4 • C (uniform land and ocean warming), AMIP +4 • C (uniform ocean warming, land adjusts by +5.86 • C) and 'ERF pattern' , in which land T s follows that of the 4xCO 2 AMIP ERF simulation from figure 2(c), with default 1xCO 2 and baseline ocean T s .In all cases the over-land feedback is positive while the over-ocean feedback is negative, but the strengths differ.
In the figure 4(a) simulation, land ∆T s = 4 • C compared with figure 4(b)'s +5.86 • C, so the latter case results in larger absolute ∆T as over land.The atmosphere may act to transfer heat horizontally and mitigate the land-ocean gradient change, contributing to why AMIP land λ δTs is smaller than PLAMIP land λ δTs .I will later refer to this as the leakage hypothesis.
Figure 4(c)'s ERF warming pattern includes land warming relative to the ocean, so the leakage hypothesis would predict a λ δTs that is smaller than PLAMIP (figure 4(a)), perhaps more akin to AMIP (figure 4(b)).However, the figure 4(c) λ δTs is larger than in PLAMIP, which contradicts the leakage hypothesis.A 'leakage' hypothesis does not require direct physical transfer of heat between land and ocean, but that circulation changes can have remote effects, as has been discussed for climate feedbacks in the so-called 'pattern effect' (Andrews et    The PLAMIP feedback strength in figure 4(a) is potentially biased by its artificial treatment of soil moisture.For the other simulations, soil moisture is consistent with land model T s , but in the PLAMIP +4 • C simulation, soil moisture follows the baseline simulation.The soil moisture changes could explain why PLAMIP H decreases (−1.63 W m −2 • C −1 ) while AMIP H increases.Following the arguments in section 3.2, the positive AMIP ∆H could also contribute to the less-positive λ δTs .However, the soil moisture is consistent with land T s in figures 4(b) and (c), but the λ δTs , differs, implying that over-land δT s feedback effects are nonlinear and sensitive to the pattern of warming.

Region-mean PLAMIP results
Region-mean results for select PLAMIP simulations are displayed in table 2, with ERF adjustments reported for doubled CO 2 and feedbacks calculated locally and then area-weighted.Reported ∆T s is either the prescribed value, or for interactive-land cases is calculated from the area-weighted mean, which introduces Table 2. Changes in region-mean properties relative to the baseline simulation for Ts, Tas and δTs, and the derived effective radiative forcing (per doubled CO2) or feedback (per 1 • C change in local Ts).The leftmost column is the scenario name following the PLAMIP convention, from which the baseline simulation is subtracted.The rightmost column describes what is calculated.The feedback entries are dashed if there is no prescribed warming in that region.some minor errors due to the treatment of coastal cells.For example, in the 4xCO 2 simulation the 'true' ocean change is +0 • C, but coastal cell contamination results in a reported +0.05 • C. The fixed-CO 2 simulations with a land-ocean warming contrast (rows 4,6,7,8) show 'leakage' between land and ocean.An extreme case is row 7, where ocean ∆T s = 4 • C and land ∆T s = 0 • C. Despite ocean adjustments and feedbacks causing ∆δT s < 0 in every other simulation, the large land-ocean contrast causes the opposite in this case, with a 1.01 • C warming of land T as and ocean ∆T as < 4 • C so that ocean ∆δT s > 0.

Change in Ts (
Table 2 shows that ERF adjustments and feedbacks counteract each other over land in ACCESS1-0.As discussed above, the commonly used linear forcing-feedback construction does not appear to work accurately for δT s over land, since λ δTs changes between simulations, so I only use it to illustrate arguments rather than quantify and attribute changes.Over-land ∆δT s clearly depends on the spatial pattern and potentially the magnitude of warming.The ERF pattern ∆T s discussed in section 3.3 and reported on row (8) is just 1.5 • C on average, but sees a larger total change in δT s than the +5.9 • C seen in A4K.There are differences in spatial patterns of changes between the low-warming and high-warming simulations for H, L, relative humidity and absorbed shortwave radiation, and while there were correlations with ∆δT s , there was no obvious physical explanation so results are not shown.
Compared with Richardson (2023), an additional insight from table 2 rows 4 and 5 is that enhanced land warming in ACCESS1-0 almost doubles the over-ocean δT s feedback.With prescribed T s over both land and ocean but no vegetation response, the 2xCO 2 δT s ERF adjustment is nearly identical over land (−0.040 • C) and ocean (−0.039 • C).Vegetation enhances the land adjustment magnitude to −0.066 • C, consistent with figure 3.

Historical changes in large ensembles
Figure 5 shows inter-model disagreement over the sign of ∆δT s over nonpolar land.Most models show small ensemble-mean changes of ±2% of ∆T s , the exception being a +5% change in CanESM2 (calculated from the gradient in figure 5(c)).The feedback response ∆δT s /∆T s from internal variability is always positive (figure 5(d)) although three of the models have a negative ∆δT s /∆T s response to forced warming (figure 5(c)).The sign difference could be explained by some combination of (i) negative ERF adjustments overwhelming feedbacks in forced warming and (ii) nonlinear feedbacks.Overall, there are only weak signs of nonlinearity in the large-scale feedbacks with GFDL-CM3 in figure 5(c) showing a change from linearity near +6 • C.
Figure 6 shows statistics of 20 year regional trends calculated across each large ensemble, mean trends in figure 6(a) and their standard deviation in figure 6(b).Regions where models show similar behavior across ensembles include the Sahara, Amazon and Congo, while the continental United States (CONUS) and Australia show larger inter-model disagreement.Of more concern for comparing satellite T s with in-situ T as trends, the ensemble standard deviation of regional trends in figure 6(b) can be large, with the most extreme case being a 2σ range of over ±0.3 • C dec −1 in GFDL-ESM2M for Australia.

Discussion
Climate models show consistently different warming trends between T s and T as over ocean, of order 5%-10% of ocean ∆T s , which was explained by Richardson (2023) as being due to a combination of CO 2 ERF adjustments and evaporation feedback.This paper extends the analysis over land, using new model tools to propose explanations for multi-decade changes in T s versus T as trends.The work complements Kan et al (2023) who reported on observed trend differences.

PLAMIP shows how AMIP methods mis-diagnose CO 2 adjustments to δT s
The PLAMIP results demonstrate why standard methods for diagnosing ERF adjustments and feedback contributions to ∆δT s may perform poorly over land.Richardson (2023) highlighted the poor over-land performance of an ERF diagnosis method based on coupled simulations.Figure 2 showed how the standard Hansen method to diagnose ERF adjustments using AMIP simulations can also be misleading, since AMIP land T s warming activates local feedbacks and biases derived ERF adjustments.In ACCESS1-0, land warming changes the 2xCO 2 TOA ERF from 3.5 W m −2 to 4.0 W m −2 (Andrews et al 2021).The effects are larger for δT s and erroneously flip the sign of the mean adjustment.The PLAMIP results show that land adjustments are in fact consistent with standard and well-established understanding of atmospheric adjustments to CO 2 .Consistent with Kan et al (2023), vegetation can adjust δT s .Figure 3 supported the hypothesis that CO 2 fertilisation reduces evapotranspiration, so increased H directly heats T as and enhances the negative ∆δT s .

Feedbacks
The table 2 and figure 4 results showed land λ δTs = ∆δT s /∆T s was positive in all simulations, but since λ δTs varied with ∆T s the standard forcing-feedback decomposition, which is linear in T s , does not apply to δT s over land.The results imply some combination of nonlinearity in local feedbacks and/or dependence on the spatial pattern of warming.There were no clear signals in the surface energy budget to identify the process causes, but examples of nonlinear processes that affect land temperature include convection, surface albedo change and vegetation interactions (Good et al 2015).Stable boundary layers tend to warm more easily in response to surface heating, compared with convective boundary layers, and regions can transition from one to the other, for example through Hadley cell expansion or contraction (Previdi andLiepert 2007, Zhu et al 2023).Drastic warming or accompanying changes in precipitation could push plants outside their thermal tolerances and result in regime changes.
Table 2 also supports a leakage effect, when land T s warms by more than ocean, the over-ocean T as sees additional warming and therefore δT s is more negative than in a uniform warming case.This leakage is unlikely to be from direct horizontal heat transfer but rather fully-coupled changes in circulation that interact with the surface energy budget.Richardson (2023) asserted that evaporation was the dominant cause of why model ocean feedbacks further drive ∆T as > ∆T s .That conclusion does not contradict the section 3.2 argument where vegetation-induced changes in H were emphasised, rather than L. It is simply that oceanic H does not increase rapidly with warming T s , while evaporation does.For feedbacks, Richardson (2023) considered surface flux responses to ∆T s , ∆T as and ∆δT s separately (termed λ s , λ as and λ δs ).For example, T s warming increases longwave (LW) emission via the Planck feedback, which for nonpolar oceans averaged λ s,LW = 5.7 W m −2 • C −1 .It was calculated that the sign of ∆δT s is only dependent on the relative strength of λ s and λ as .Specifically, it was derived that the threshold requirement is that the feedback to surface temperature be at least twice as strong as the feedback to T as , i.e. we require that λ s > 2λ as in order for ∆δT s to be negative.Meanwhile λ δs modulates changes in δT s , but does not affect its sign.
Radiative kernel analysis showed that the non-cloudy part of the atmosphere has a combined LW plus shortwave (SW) feedback of λ as,LW + SW = 5.8 W m −2 • C −1 over ocean (Soden et al 2008, Smith et al 2018), approximately balancing the λ s,LW term.The cloud and surface albedo feedbacks were insufficient to tip the balance past the λ s > 2λ as threshold.Over oceans, if all else is held constant, then L would increase at approximately 7 W m −2 • C −1 (Siler et al 2018), which is sufficient to ensure that λ s > 2λ as on average.In the PLAMIP +4 • C simulations, the derived over-ocean L feedback includes a λ δs component, since δT s changes.The result is closer to λ s + λ δs ≈ 4 W m −2 • C −1 , and since it is likely that λ δs < 0, the implication is that λ s >4 W m −2 • C −1 .ACCESS1-0 over-land L increases by <0.2 W m −2 even when subjected to ≥4 • C warming and increased δT s .The limited moisture availability over land prevents the dominant evaporative cooling feedback that occurs over ocean.Changes in cloud feedback or surface albedo could strongly affect the relative strengths of λ s and λ as and flip the sign, but the model behavior suggests that the balance favours ∆δT s > 0 in ACCESS1-0.The feedback analysis was also derived for a closed system with no net change in horizontal heat transfer or heat uptake.While most CMIP5 models studied in Richardson (2023) showed little net transfer between nonpolar land and ocean at equilibrium in coupled simulations, it could remain a factor in some models or non-equilibrium situations.

4.3.
Combined effect on δT s , inter-model and regional differences from the large ensembles In ACCESS1-0 the over-land ERF adjustments and δT s feedbacks oppose each other, resulting in a generally small forced net change in δT s as a fraction of ∆T s .The exception is for the simulation in which ∆T s follows the 4xCO 2 simulation adjustments over land, which is unlikely to represent the real-world.For five of the large ensembles in figure 5, over-land ∆δT s falls within ±2% of the warming trend.For the outlier model CanESM2, the effect on long-term global mean temperature estimates from the 5% nonpolar land T as -T s trend differences is only 0.015 • C per 1 • C of global (land + ocean) ∆T s .The large ensemble simulations include substantial aerosol forcing in the early parts of the simulations with no strongly divergent δT s response, implying a relatively minor aerosol ERF adjustment in these models.
The results overall support a small role for forcing adjustments and feedbacks in affecting long-term δT s over land, implying that a sufficiently long satellite global-land temperature record would not have substantial biases relative to the in-situ T as dataset built from land station networks.However, in the present day, users are confronted by relatively short satellite data records, and both ERA5 in figure 1 and the large ensembles in figure 6 show that meaningful trend differences are possible between T s and T as over 20 years and for large regions.The forced response in most of the selected regions is positive in δT s (i.e.T s warms more than T as ) in all models, with the exception of India where most models project the opposite, and CONUS or Australia where there is disagreement.Changes in circulation, cloudiness, moisture availability and vegetation are plausible candidates to explain regions with large differences in behavior.

Future work
For real-world applications, regions such as India and East Asia have seen large aerosol burden trends in recent decades (Ramachandran et al 2020, Hu et al 2021) and potentially adjustments may have caused δT s to behave differently there.Detailed regional analysis and single-forcing simulations such as those from the precipitation driver and response model intercomparison project, could address whether non-CO 2 forcings have notably affected regional T s -T as trend differences (Myhre et al 2022).PLAMIP-like simulations from more models and with a range of temperature changes and spatial patterns would also be helpful.Further work akin to Kan et al (2023) using combined observations interpreted relative to model outputs could provide further insight.Vegetation effects were seen in ACCESS1-0 adjustments to CO 2 , and vegetation may also respond to temperature, moisture, cloud and aerosol cover or land-use change.Further targeted simulations or data analysis (e.g.Cao et al (2023)) could identify whether the regionally-varying response of plants could also contribute meaningfully to long-term changes in δT s .
Albedo or cloudiness could also be regionally crucial since surface sunlight absorption drives increased δT s , as seen by the exceptionally large differences in T s and T as during heatwaves such as the 2023 European events.A more refined theoretical approach would account for SW radiation and other factors, such as lapse rates, atmospheric stability, and PBL properties.

Conclusions
Models suggest that 20 year land T s records, as currently available from satellite data, can have physically different trends than T as from in-situ datasets.For nonpolar land within ±60 • S/N, a difference of 0.06 • C dec −1 occurred in ERA5 from 1985-2004.If studies are regionally based, then ERA5 and large ensemble outputs suggest larger differences are possible, with the most extreme differences being ±0.6 • C dec −1 over continental Australia in the GFDL-ESM2M model.
I showed that standard AMIP methods to diagnose forcing adjustments in δT s fail, even reporting results of the wrong sign.By also prescribing land T s as in PLAMIP, it is revealed that the δT s adjustments to CO 2 forcing are negative and therefore fully consistent with understanding of CO 2 ERF adjustments (Kamae et al 2015).The opposite AMIP behavior is a result of local feedbacks to land warming in the AMIP simulation.The over-land response is nonlinear in ∆T s , suggesting that a standard forcing-feedback decomposition will be inaccurate.However, the decomposition provides qualitative understanding that can highlight processes such as the importance of vegetation CO 2 fertilisation response.
While models disagree, particularly regionally, long-term differences in land T s and T as warming do not seem likely to greatly affect derived temperature trends, unlike over oceans where forcing and feedbacks reinforce each other and drive biases in long-term warming estimates.On the other hand, scientists studying satellite records of approximately 20 year length, especially in individual regions, may have to consider that differences between satellite T s and in-situ T as are potentially physical rather than results of dataset errors alone.

Figure 1 .
Figure 1.ERA5 linear trends during (a)-(c) the 2003-2022 AIRS period and (d)-(f) 1985-2004, a period of the same length selected for its larger differences in land Ts-Tas trends.Column headed (a) is near-surface air temperature Tas, (b) is surface skin temperature Ts, (c) is the difference in Ts minus Tas.Note the change in colour scheme scale between columns headed (a)-(c).The text above each panel is the area-weighted mean trend over nonpolar land.Data from Hersbach et al (2023).
the over-land AMIP δT s 2xCO 2 adjustment of +0.03 • C is of opposite sign to the PLAMIP derived −0.09 • C. The difference must be related to the AMIP land-surface T s warming of 0.75 • C in figure 2(c).Figures 2(a) and (b) also show over-ocean ∆δT s patterns even though ocean ∆T s = 0 • C, which must be caused by responses to the land warming.

Figure 2 .
Figure 2. 2xCO2 ERF responses estimated as half the change from the baseline simulation to a 4xCO2 simulation.(a) change in δTs = Ts − Tas in AMIP, (b) change in δTs in PLAMIP, (c) change in δTs from AMIP minus that in PLAMIP.

Figure 3 .
Figure 3. Responses to 2xCO2 responses with interactive vegetation (top row) and with interactive vegetation turned off (center row).The bottom row is the top minus the bottom, showing the effect of interactive vegetation.Values derived as half the difference between the baseline and 4xCO2 simulation.Column headed (a) is δTs, (b) is latent heat and (c) is sensible heat.

Figure 4 .
Figure 4. Feedback parameters derived for each grid cell's ∆δTs divided by the grid cell's ∆Ts.(a) PLAMIP with uniform +4 • C warming over land and ocean, (b) AMIP simulation with uniform +4 • C over ocean and freely-varying land, (c) the ERF pattern ∆Ts with an average of +0 • C over ocean and +1.5 • C over land.In the ERF pattern simulation, the land ∆Ts from a 4xCO2 simulation is applied while CO2 levels follow the baseline simulation.

Figure 5 .
Figure 5. (a) ensemble mean and 5%-95% range for land Ts anomaly relative to 1970-1979, (b) the same for land δTs.(c) the ensemble mean ∆δTs as a function of ensemble mean ∆Ts, showing the forced response, including both ERF and feedbacks and (d) the same derived relationship but for the internal climate variability component of one run.

Figure 6 .
Figure 6.Region-mean trend statistics from large ensembles during a 20-year period, typical of the length of contemporary satellite land Ts records.(a) ensemble mean of trends in δTs, (b) standard deviation of the ensemble of trends in δTs.'Land' refers to nonpolar land and the other regions are defined in table 1.Note the change of scale between panels.

Table 1 .
Selected land regions and their bounding latitude-longitude boxes.