Investigating permafrost carbon dynamics in Alaska with artificial intelligence

Positive feedbacks between permafrost degradation and the release of soil carbon into the atmosphere impact land–atmosphere interactions, disrupt the global carbon cycle, and accelerate climate change. The widespread distribution of thawing permafrost is causing a cascade of geophysical and biochemical disturbances with global impacts. Currently, few earth system models account for permafrost carbon feedback (PCF) mechanisms. This research study integrates artificial intelligence (AI) tools and information derived from field-scale surveys across the tundra and boreal landscapes in Alaska. We identify and interpret the permafrost carbon cycling links and feedback sensitivities with GeoCryoAI, a hybridized multimodal deep learning (DL) architecture of stacked convolutionally layered, memory-encoded recurrent neural networks (NN). This framework integrates in-situ measurements and flux tower observations for teacher forcing and model training. Preliminary experiments to quantify, validate, and forecast permafrost degradation and carbon efflux across Alaska demonstrate the fidelity of this data-driven architecture. More specifically, GeoCryoAI logs the ecological memory and effectively learns covariate dynamics while demonstrating an aptitude to simulate and forecast PCF dynamics—active layer thickness (ALT), carbon dioxide flux (CO2), and methane flux (CH4)—with high precision and minimal loss (i.e. ALTRMSE: 1.327 cm [1969–2022]; CO2 RMSE: 0.697 µmolCO2m−2s−1 [2003–2021]; CH4 RMSE: 0.715 nmolCH4m−2s−1 [2011–2022]). ALT variability is a sensitive harbinger of change, a unique signal characterizing the PCF, and our model is the first characterization of these dynamics across space and time.


Permafrost carbon feedback
Frozen soil and carbon-rich permafrost characterizes nearly 14 million square kilometers of the global terrestrial surface, with total soil organic carbon stock estimates near 1307 ± 170 PgC (Hugelius et al 2014).Across the Circumarctic, quantifying the persistent irregularities and impacts attributed to permafrost degradation remains a scientific challenge.The transitional state of permafrost and spatiotemporal ALT heterogeneity drives abrupt changes emerging from rapid, nonlinear carbon-climate feedback mechanisms.These processes are correlated with several biotic and abiotic factors throughout the tundra and boreal, including tundra shrub encroachment, boreal forest migration, caribou migration patterns, topography, precipitation, solar radiation, land surface temperature, and subsurface hydrologic flow (Lloyd et al 2003, Evans et al 2020, Aguirre et al 2021, Joly et al 2021).Carbon release originating from the permafrost-carbon feedback is a climate change catalyst that amplifies localized warming patterns, disrupts carbon cycle partitioning, and destabilizes land-atmosphere coupling mechanisms and feedback thresholds (Sannel and Kuhry 2011, Koven et al 2015, Schuur et al 2015, 2022).
Abrupt permafrost thaw triggers ground subsidence, thermokarst formation, and the proliferation of wetlands, ponds, and methane emission hotspots near littoral zones (Jorgenson et al 2006, 2013, Walter et al 2007, Klapstein et al 2014, Turetsky et al 2014, Olefeldt et al 2016).Across the northern latitudes, increased warming and precipitation trends continue to foster an exponential increase in permafrost carbon release, i.e. permafrost carbon loss is a function of global mean temperature change: pC loss = f (GMT) (Pastick et al 2015).Although most model projections arbitrarily terminate between 2100 and 2500 CE, it is purported that anthropogenic-induced warmingcoupled with permafrost degradation and soil carbon release-may trigger a positive carbon-climate feedback and sustain a warmer, inhospitable equilibrium for hundreds of thousands of years (Koven et al 2011, McGuire et al 2018, Steffen et al 2018, Wang et al 2023).
Permafrost degradation remains a substantial contributor to the uncertainty of soil carbon release and may promote a positive feedback on the climate system (Koven et al 2015, Burke et al 2017).Synergistic relationships exist among permafrost degradation, thaw subsidence, and terrestrial carbon cycling dynamics.Despite these revelations, the physical processes governing the rate, magnitude, and extent of permafrost degradation and soil carbon remobilization remain uncertain (Koven et al 2015).The mercurial nature of thaw-induced carbon release presents many challenges to quantifying the extent and variability of this feedback (Miner et al 2021).Therefore, it is necessary to constrain drivers of change in model projections to disentangle and interpret the PCF signal (Lawrence et al 2015, Song et al 2020)).Rectifying these response patterns requires the ability to scale, simulate, and project the physical dynamics of moisture and energy at the landatmosphere interface.However, the mechanics and interactions that currently govern each earth system model vary widely (Randall et al 2007, Li et al 2017).Furthermore, few earth system models simulate and account for the mechanisms and long-term effects of the PCF (e.g.UKESM, CanESM, CESM).
Spatiotemporal limitations and instrument constraints inherent to remote sensing technology present considerable obstacles for extracting information from the subsurface profile and restrict the ability to quantify and/or infer thaw variability with high precision and confidence (Gabarró et al 2023).In the high latitudes, the synthesis of optical remote sensing products remains a challenge due to data availability limitations from mechanical restrictions and natural phenomena, such as frequent cloud cover, short summer periods, and low illumination angles (Esau et al 2023, Gay 2023b).These acquisition challenges, compounded with instrument constraints and limited representation in process-based models, prevent accurate quantification of the PCF (i.e.frost heave-thaw dynamics, subsurface freeze-and breakup events, and nutrient cycling).
Despite these knowledge gaps, modeling and reanalysis products are realistic solutions that adopt simple sinusoidal annual temperature gap-filling algorithms.Additionally, cross-sensor probability density functions and probabilistic modeling approaches are used to compensate for crude spatial resolution (Westermann et al 2011).Benchmarking and coupled model intercomparisons (CMIP6) have reduced the broad swaths of uncertainty.Model frameworks continue to improve as a breadth of observational data provide intelligence for calibration and validation (CALVAL) (Collier et al 2018, Beauchamp et al 2020).Concurrently, multimodal applications with AI and observational data quantify, emulate, resolve, and improve upon the physical representation of the climate system with increased flexibility, efficiency, and precision (Koppe et al 2019, Xu et al 2021).Limitations inherent to tractable, modular-limited frameworks and sensitive, computationally intensive AI approaches present opportunities for unifying data harmonization, information sharing, and bias correction among physics-based and data-driven methods (Cuomo et al 2022, Slater et al 2022, Fernandes et al 2023).In response to these sustained and co-emerging limitations, we capture abrupt and persistent changes in subsurface conditions and disentangle control factors driving the PCF with a process-informed, data-driven formulation.

Deep learning
DL is a collection of machine learning algorithms that integrate artificial neural networks (ANN) in multiple layers to discover and extract high-level features and patterns.Convolutional neural networks (CNN) reduce the computational burden of parameterization during training without sacrificing performance.Alternatively, recurrent neural networks (RNN) are complex nonlinear systems that produce feature maps based on impulse sets at discrete time intervals (You et al 2017, Körner andRußwurm 2021).The equation below (E1) contextualizes the feedback modulation critical for long short-term memory networks (LSTM) wherein learned representation (i.e.recurrent output) at the current time step combines hidden states and loss iteratively (S6): The ability to handle complex multidimensional earth system data while improving data assimilation methods, capturing non-linear relationships, quantifying uncertainty with probabilistic measures, and enhancing modeling precision, efficiency, and forecasting power present important advantages and motivations for implementing AI into current ESM frameworks.Furthermore, encoding previous hidden states is crucial for preserving the internal dynamics and learned representation of memory.This allows the model to reconfigure and assess future cell states based on performance metrics (Peterson 2002).A classic example is soil moisture, a complex and sensitive covariate governed by abiotics (i.e.climatology, topography, soil hydraulics) functioning as a key explanatory variable responsible for encoding the ecological memory of the earth system (Ogle et al 2015).Ecological memory refers to the impact of previous events and drivers of change on the dynamics and complexities of ecosystem response patterns (Khalighi et al 2022).These elusive, interconnected signals illustrate the mercurial nature of memory and change and demonstrate why capturing these effects is critical for simulating real-world dynamics.

Study regions and physiography
The study region is composed of four regional subdomains delineated with borough and census tracts in Alaska (The North Slope [TNS], Interior Boreal [INT], Seward Peninsula [SEW], Yukon-Kuskokwim Delta [YKD]) and over 13.1 million unique observations across 790 field sites.Figure 1 illustrates the distribution of the flux tower network and field campaigns across Alaska.These subdomains are characterized by zone-defined climatological variability and indicate dynamic subsurface conditions and aboveground geomorphology.
TNS exhibits continuous permafrost, mesic tundra, acidic/non-acidic loess foothills, hillslope water tracks, colluvial-basin deposits, wet tussock/nontussock sedges, open/closed erect prostrate and dwarf shrubs, moss and forb communities, and sporadic wet graminoid vegetation at higher altitudes (Walker et al 1989, Raynolds et al 2019).The INT displays extensive land cover heterogeneity, i.e. rolling hills and lowland terrain with coniferous and deciduous forest canopies, continuous and discontinuous frozen soil and permafrost, thermokarst, lowland peat bogs, and wet bottomlands.This region is characterized by continental climate regimes, frequent wildfire activity, high elevation intersections between alpine tundra and arctic lowland tundra, and rapid ecological succession ranging from wet, disturbed regions with shrub and productive graminoid colonization to warm, dry hillsides with spruce replacement.The SEW is characterized by surface/subsurface dynamics ranging from sparse upland tundra flats with low productivity and shallow active layers, well-drained embankments with average productivity and sprucedominated stands, and densely forested riparian networks with deep active layers (Raynolds et al 2019).The YKD is defined by low-and-upland terrain with extensive wet tussock sedge tundra, lichen-rich peat cover, low vascular plant speciation, and dry tussock graminoid-dwarf shrub communities (Raynolds et al 2019).Plant functional types and species richness contain mesic lichen, sedges, and dwarf shrub vegetation, and isolated instances of erect, prostrate shrubs across volcanic outcrops and drainages (Frost et al 2020).The YKD displays spatially variable, terraindependent frozen soil with spatially variable continuous, discontinuous, and sporadic permafrost peppering the upland tundra, flat rolling hills, and eolian surficial deposits.This region exhibits coastal maritime climate patterns with frequent upland wildfires fueled by lichen-rich peat and tussock litter, high shrub abundance, and low seed recruitment (Frost et al 2020).Following the workflow in figure 2, the construction of the base dataset and subsequent model development required spatiotemporally constrained preprocessing methods to address latent barriers (i.e.study design), reduce dimensionality, initialize gap-filling and data transformations (i.e.scaling), and preserve information legacies.Dimensionality reduction was used to reduce model complexity and improve generalization, reduce multicollinearity, and enhance computational efficiency.Replicate samples were considered independent features and were not resampled to reduce overgeneralization and bias inflation.The preprocessing pipeline used supervised learning while missing values were imputed by backward-forward gap-filling bilinear interpolation.This gap-filling method served as a padding mask for flux data, reducing sample mismatch and enabling temporal alignment over 53 years.Feature engineering addressed data type conflicts and replicate sampling by removing string     ).VIF functions as a diagnostic tool in multivariate forecasting to quantify multicollinearity severity, providing a scalar measurement that encapsulates how much the variance of an estimated regression coefficient increases when the predictors are correlated (Craney Surles 2002, Akinwande et al 2015).

Data synthesis
PCA compresses data in linear systems; however, for nonlinear relationships, multivariate dimensionality reduction with autoencoding networks was used to explore features' spatiotemporal response patterns via neural-based decomposition.ADF and KPSS methods address deterministic trends and lagging complexities by examining unit root stationarity and non-stationarity in time series data (Lopez 1997).The most significant principal components exhibiting maximum variance were extricated from the 56feature VIF-reduced multivariate dataset in accordance with dimensionality reduction methods at an 85% cumulative explained variance (CEV) threshold (86.12% [VIF: 61.96%; PCA: 52.17%]).Moreover, pairwise correlation matrices and serial autocorrelations were computed with ranking criteria and Fourier transformations of the power spectra.Based on these results, three PCF features were targeted within the base dataset: ALT, methane flux, and carbon dioxide flux (ALT; CO2_1_1_1; FCH4_1; table 3).These measurements were reframed and prepended with time-lag −3 lead +2 supervision (i.e.learning encouragement) prior to model development.and modeling output (figure 4).This framework presents a novel method to unify spatiotemporal multimodal data.The application quantifies and resolves real-world dynamics and disparities among abrupt and persistent drivers of change.For this study, the in-situ module and corresponding layer architecture is utilized for teacher forcing (S4 and S5).The processdriven component of the network accounts for realworld physical dynamics.The model implements test data and evaluation criteria to simulate ecological memory, highlighting the direct, indirect, concurrent, lagged, internal, and external effects of thawinduced carbon release on the surrounding environment.The data-driven component is characterized by a CNN-filtered LSTM-encoded RNN that uses teacher forcing from in-situ and flux observations into a network of processing layers with activation and regularization functions (Williams and Zipser 1989).The outputs are trained within an autoencoder framework that encodes and imputes optimized decoding protocol for generative adversarial training and synthetic time series' reconstruction (Yoon et al 2019, Klopries andSchwung 2023).This generative process utilizes encoder-decoder compression to denoise, transform, and learn the latent feature space representation of inputs while using the identity function to reconstruct the inputs from the compressed latent-space vector (Kim et al 2021, Bourland et al 2022, Chen and Guo 2023).

Model development
The structure of the model configuration is optimized with multimodal inputs and a predefined hyperparameter dictionary generated by a sequential model-based optimization method (i.e.Bayesian optimization search algorithm).These nonlinear programs culminate in a series of logical operations that scan and determine the best-performing model configuration based on regularizing model complexity and optimizing learning efficiency (i.e.early stopping and hyperparameter tuning in latent space).During training, validation loss is used as the objective function.The tuner constructs an a priori distribution model and iterates through a set of hyperparameters (H p ) to compute the lowest objective score (f (x)) across a distribution of validation metrics (E2).The P(f (x)|H p ) model is updated every trial with these results, and new criteria are established from a selection function (Wu et al 2019): The GeoCryoAI model emulated hidden dynamics, linking approximately 2.511 million parameters with 1.177 billion interpolated measurements to computationally reflect the state space of the earth system from in-situ measurements.The model design required an efficient, flexible approach to emulate nonlinearities with ground-truth observations with module reconstructions including consolidated tabular time-series layer processing (i.e.LSTM) and sequential time-distributed convolving layers (i.e.Conv1DLSTM).During construction, expanded layer selection was guided by the loss of spatial information with LSTM applications, and the persistent degradation in quality inputs in both data structure and processing resilience.Model performance was optimized via hyperparameter tuning, with the loss function reconfiguring weight matrices while facilitating active learning and convergence, generating output vectors with every batch over each epoch.Furthermore, error metrics were the objective of the NN, with each iteration seeking to reduce the effects of the loss function on weight matrices and vector components.In response to Bayesian optimization hyperparameter tuning and validation loss captured after training, the training and validation subsets were shifted temporally to provide more data for the model to learn (ALT: 1969(ALT: :2018(ALT: , 2019(ALT: -2020;;CH 4 : 2011CH 4 : :2018CH 4 : , 2019CH 4 : -2020;;CO 2 : 2003CO 2 : :2018CO 2 : , 2019CO 2 : -2020)).

Synthesis results
The cleaned 56-feature dataframe yields dynamic relationships; more specifically, thaw depth and subsurface soil respiration demonstrates moderate anticorrelation (r soil_[CO2]_15cm : −0.311942), while thaw depth, CH 4 and CO 2 fluxes illustrate positive correlations (r FCH4_1 : 0.334563; r FCH4_2 : 0.309447; r CO2_1_1_1 : 0.243649).These corroborate the ratelimiting biogeochemical mechanisms that fortify the permafrost-carbon feedback: increased thaw yields higher water availability, halting respiration and instead, fostering warmer anoxic environments suitable for methanogenesis.At the surface, increased thaw and water availability promotes aerobic respiration and evapotranspiration in the soil and surrounding vegetation.Further anticorrelated interactions were noted, i.e. thaw depth relative to GHF (r G_1_1_4 : −0.267481) and WT (r TW_1_1_1 : −0.219022).In addition to flux variables, a positive correlation was identified between thaw depth and WTD (r WTD_2_1_1 : 0.237545).The complexities of the earth system are made apparent wherein mixed relationships were observed as well, with AT exhibiting negative and positive correlations relative to thaw depth (r TA_1_2_1 : −0.205626; r TA_1_3_1 : −0.180273; r TA_1_5_1 : 0.263813).We examined the spatiotemporal distribution (i.e.statewide versus regional, 1969-2022) of permafrost degradation across the four subdomains.TNS exhibits the lowest ALT mean and variance in permafrost degradation (48.460 ± 14.976 cm), INT displays the highest ALT variance (±31.164cm), and the YKD demonstrates the highest ALT mean (62.415 cm).The temporal period among these features varies substantially (i.e.ALT, 1969ALT, -2022;;CO 2 , 2003CO 2 , -2021;;CH 4 , 2011CH 4 , -2022)).Thaw depth measurements were constrained in alignment with the temporal resolution of carbon flux (CH 4 , CO 2 ) to identify potential nuances governing the dynamics of the PCF.The temporally constrained plots in figure 5 illustrate contrasting annual thaw depth, carbon dioxide, and methane flux variability across Alaska.
Abrupt thaw signals in 2014 and 2017 are complemented by persistent increases in CO 2 and CH 4 , and increased CH 4 , respectively.An interesting anticorrelated relationship is also observed: carbon dioxide and thaw signal relationships are less obvious which may be attributed to increased sequestration of CO 2 resulting in tundra greening and increased primary productivity.Another explanation may be rooted in feedback attribution: carbon dioxide regulates thaw rate, but due to increased warming, methane and thaw experienced runaway effects (i.e.2014-2017).Error metrics (RMSEs) for ALT, CO 2 , and CH 4 were computed from feature variances originating from calibration and measurement error prior to model training (i.e.aleatoric stochasticity): 4.348 cm, 20.117 umolCO 2 m −2 s −1 , and 79.185 nmolCH 4 m −2 s −1 , respectively.
The evaluation metrics demonstrate the performance and predictive capacity of the model (figure 6  of the Bayesian hyperparameter optimization tuning process is necessary to include more operands (i.e.loss functions, activation functions) and continuous arrays (i.e.batch size, epochs) to isolate relevant causes of these magnitude disagreements.

Discussion
To our knowledge, this study is the first to use a process-constrained DL system to understand key regulators of the PCF.The model presented minor prediction errors and exposure biases, and the teacher forcing approach simplified the loss landscape in exchange for computational efficiency.Over time however, model performance degraded, thereby warranting spatial expansion and more discriminant temporal delegation.In addition, the vanishing gradient (i.e.learning terminates, performance decreases) presented multiple challenges throughout the training process, including the risk of overfitting due to model complexity (i.e.dampened with dropout generalization) and the inability to label sparse and coarse data.To ameliorate these challenges, regularization and optimization strategies were integrated into the pipeline to isolate the ideal learning rate and optimization for thaw-induced carbon signal retention.Optimization strategies included modifying layer structure and constituency of a layer, instance, or group, weight standardization, and synchronous batch normalization with in-situ data and hybridized ensembles (Iofee and Szegedy 2015).
DL models perform well in time series prediction; however, quantifying the internal mechanisms presents many challenges limiting the utility of their application.In addition, the absence of interpretability can hinder knowledge transfer among disciplines and necessitates more explainable methodologies.The current foundational understanding of these complex systems indicates the multilayered internal mechanisms facilitate active learning from error via back-forward propagation, non-linear functions (e.g.ReLU, tan −1 h, ´) and weight-optimized regularization between biascorrected nodes (e.g.dropout, batch normalization; Vakalopoulou et al 2023).This knowledge is important when gauging the feasibility of integrating AI into ESM frameworks; more specifically, identifying feature importance for explainability (e.g.LIME, SHAP), designing attention mechanisms for interpretability, developing hybridized models with physical constraints, and employing ensemble-based Bayesian methods with confidence intervals for uncertainty quantification and enhanced reliability.
Uncertainty and prediction error overgeneralize performance gains resulting from overfitting during training.This behavior was expected and is a function of the initial model's inability to capture sub-seasonal trends in freeze-thaw dynamics (e.g.peak thaw, abrupt thaw events) from available data instances with standardization and scaling methods.Additional uncertainty may originate from landscape-level dynamics and sustained regional lagged effects in response to increased warming trends (i.e.LHF).Moreover, field observations and flux tower data may introduce error through instrument failure, perturbation, signal-to-noise calibration issues, and sampling redundancy (i.e.aleatoric uncertainty).Other potential sources of uncertainty may be reconciled with other pre-processing methods (i.e.temporal downscaling), data collection, and data assimilation techniques.
Ongoing work seeks to isolate causal links and feedback drivers of the PCF with multivariate mutual information and transfer entropy.In addition, we continue to modulate the spatiotemporal data lens to quantify, scale, and forecast PCF feedback mechanisms across Alaska by assimilating multimodal remotely sensed hyperspatiospectral inputs and model outputs into the GeoCryoAI framework (e.g.AVIRIS-NG, UAVSAR, SIBBORK-TTE, TCFM-Arctic).Airborne hyperspectral imaging and synthetic aperture radar systems provide a means to quantify, scale, and interpret features at higher resolution and precision over broader domain (e.g.carbon flux, thaw subsidence).In addition, these efforts seek to extrapolate PCF signals across the Circumarctic and generate space-time zero-curtain maps in parallel with future spaceborne missions (e.g.NISAR, MicroCarb, CO2Image, MERLIN, TRISHNA).This methodology will facilitate future exploration into knowledge discovery, reconcile algorithm and instrument limitations, resolve empirical assumptions and observational bias, and inform mitigation efforts and risk management across the Arctic region.

Appendix B. Glossary of terminology
Accuracy: The accuracy of a model is in reference to the percentage of correct predictions relative to observations.Activation function: The activation function extracts the weighted sum of inputs from a previous layer and generates output value(s) to catalyze or 'activate' the next layer in the architecture.Active learning: Active learning is a specialized form of semi-supervised machine learning wherein a learning agent interacts with an oracle (i.e.user) to obtain labels for new data.Algorithm: An algorithm is a specific method and/or process that outlines how to solve a research problem and generate machine learning frameworks.Artificial neural network: An ANN is a structure composed of successive layers of interconnected units known as artificial neurons.urement that quantifies the multicollinearity between independent variables within a given time series and/or regression problem.Verification: Verification is the process of confirming that an implemented model functions as intended, ensuring that it operates according to the expected specifications and requirements.It involves validating that the model behaves correctly and produces the desired outputs within the defined constraints.Wiener-Khinchin theorem: Mathematical theorem that establishes a relationship between the autocorrelation of a stationary time series relative to the power spectral density of the random processes governing the time series in question.

Figure 1 .
Figure 1.Spatial distribution of subdomains and ground truth networks across Alaska.The colorized polygons represent each subdomain, and the white points reflect in situ and flux sites used for teacher forcing.

Figure 2 .
Figure 2. Data processing and model development workflow.

Figure 3 .
Figure 3. Correlation matrix of VIF-trimmed 56-feature dataset after multicollinearity reduction (i.e.35 features removed) with high-confidence low-correlation feature dependency.The original 91-feature base dataset correlogram is referenced in S1.
GeoCryoAI is a state-of-the-art DL architecture composed of bidirectional memory-based recurrent networks (Schuster and Paliwal 1997, Graves and Schmidhuber 2005, Shi et al 2015, Gay et al 2022, Gay 2023b) capable of uncovering hidden response patterns and disturbance signals from in-situ measurements, remote sensing observations,

Figure 5 .
Figure 5. Annual time series of ALT, CO2, and CH4 measurements relative to temporal coverage (a).The underlying plots are constrained to the temporal period of carbon dioxide and methane flux variability across Alaska, 2006-2019 (b).
(a)).Although the predictions indicate accurate impulses over time (figure 6(b)), the magnitudes of ALT predictions are underestimated while CH 4 and CO 2 flux predictions are overestimated.The contributing factor for these disparities is likely due to data imbalance.Although randomization of data splitting (i.e.training, validation, testing sets) is best practice, data partitioning prioritized sample representation after cleaning while retaining the temporal structure.Moreover, the spatiotemporal sparsity of data samples likely contributed to these disparities, which was both accounted for and mitigated with gap-filling and time laglead methods.For example, ALT training data consisted of 72.63% of the 98 753 ALT samples (1969-2022), CO 2 training data consisted of 74.05% of the 547 101 CO 2 samples (2003-2021), and CH 4 training data consisted of 71.23% of the 212 642 CH 4 samples (2011-2022).Another possible factor contributing to underestimation is the introduction of excessive regularization (i.e.dropout) in the modeling framework, leading to an oversimplification of the model and feature variance.Conversely, the potential issues facilitating overestimated feature predictions may originate from model overcomplexity thus capturing spurious dynamics.To counteract these potential discrepancies and misinterpretation of derived results, more sensitive evaluation metrics could be evaluated in addition to further refining

Figure 6 .
Figure 6.Loss functions (a) and predictions (b) derived from GeoCryoAI simulations of in situ thaw depth and carbon release.

Table 1 .
In-situ and eddy covariance data used for teacher forcing and model training.

Table 2 .
Comprehensive iterative ranking of multicollinearity reduction (VIF), displaying the most significant features contributing to least variance (VIF threshold < 4).These features were dropped, with the remaining features listed in S2.

Table 3 .
Stationarity results from ADF and KPSS testing on raw and gap-filled (GF) pre-and post-trend differenced datasets.
lowing institutions and campaigns for their research support, data accessibility, and funding opportunities: NASA, USGS, ABoVE, CALM, UVA, UAF, GWU, NAU, UC, WCRC, ARCUS, and the USPA.
It incorporates nonlinear activation functions and resembles the neurons in an animal brain.Augmented Dickey fuller test: A common statistical test often utilized to determine the stationarity of a particular time series in question.Autoencoder: An autoencoder is a type of ANN used for unsupervised and non-linear dimensionality reduction.It aims to generate efficient representations of data.Backpropagation through time: Backpropagation through time is a technique used to train ANNs for computing gradients required for weight calculation in the network.Batch: A batch refers to a group of examples used for one gradient update during model training.State space refers to the formulation of an artificial intelligence problem in terms of states and operators.It usually consists of a start state, goal state, and a problem space that is searched to find a solution.Test set: A test set is a collection of observations used to assess the predictive performance of a model at the end of training and validation; more concretely, this subset of the data available provides further insight into model performance based on its ability to generalize and fit new unexplored datasets.Time series: A time series is a sequence of data points recorded at specific times and indexed according to their order of occurrence.Testing: Testing evaluates the final performance of a model using hold-out data in supervised machine learning.Testing data refers to the subset of available data selected for the testing phase during model development.Training data: Training data is a subset of the data available and is commonly associated with the methodology of algorithm construction, providing information to learn and subsequently infer and generate predictions throughout model development.Training set: A training set is a set of observations used to generate machine learning models through the training process.Transfer learning: Transfer learning is a branch of machine learning that leverages knowledge gained from solving one problem to address a different but related problem.It involves reusing a pre-trained model or its weights as a starting point for a new model on a different task.Uncertainty: Uncertainty refers to a range of values that are likely to contain the true value, representing the lack of certainty in a prediction or estimation.Underfitting: Underfitting occurs when a machine learning algorithm fails to capture the underlying structure of the data effectively.This often happens when the model is too simplistic or inappropriate for the given task.Underfitting results in poor performance on both the training and test sets, as the model fails to incorporate relevant variations in the data necessary for accurate predictions.Unsupervised learning: Unsupervised learning is a branch of machine learning that focuses on inferring a function describing the structure of unlabeled data.It involves training a model to discover patterns in an unlabeled dataset, such as clustering.Validation: Validation is the process of using a separate set of data, known as the hold-out data, to assess the performance of a trained model.It helps determine if any iterative modifications are needed to improve the model's performance.Validation provides feedback on how well the current parameters of the model generalize beyond the training set.It is distinct from the testing phase, which is the final assessment of the model's performance.Validation set: A validation set is a collection of observations used during model training to evaluate how well the current parameters of the model generalize beyond the training set.By assessing the performance on the validation set, one can identify if the model is overfitting.If the model's performance improves on the training set but deteriorates on the validation set, it suggests overfitting, and training should be terminated.Vanishing gradients: Vanishing and exploding gradients present significant challenges and obstacles to the performance of recurrent neural networks in gradient-based learning methods with backpropagation.This issue arises because the weights of the neural network are updated in proportion to the partial derivative of the error function with respect to the current weight during each iteration of training.Variance: Variance refers to the error caused by the sensitivity of a model to small fluctuations in the training set.It is computed as the expectation of the squared deviation of a random variable from its mean.A low variance indicates that a model is internally consistent, with predictions exhibiting minimal variation from each other after each iteration.However, high variance and low bias suggest the model may be overfitting and excessively relying on noise present in the training set.Variance inflation factorization: A statistical meas-