Leaving No Branches Behind: Predicting Baryonic Properties of Galaxies from Merger Trees

Galaxies play a key role in our endeavor to understand how structure formation proceeds in the Universe. For any precision study of cosmology or galaxy formation, there is a strong demand for huge sets of realistic mock galaxy catalogs, spanning cosmologically significant volumes. For such a daunting task, methods that can produce a direct mapping between dark matter halos from dark matter-only simulations and galaxies are strongly preferred, as producing mocks from full-fledged hydrodynamical simulations or semi-analytical models is too expensive. Here, we present a graph-neural-network-based model that is able to accurately predict key properties of galaxies such as stellar mass, g − r color, star formation rate, gas mass, stellar metallicity, and gas metallicity, purely from dark matter properties extracted from halos along the full assembly history of the galaxies. Tests based on the TNG300 simulation of the IllustrisTNG project show that our model can recover the baryonic properties of galaxies to high accuracy, over a wide redshift range (z = 0–5), for all galaxies with stellar masses more massive than 109 M ⊙ and their progenitors, with strong improvements over the state-of-the-art methods. We further show that our method makes substantial strides toward providing an understanding of the implications of the IllustrisTNG galaxy formation model.


INTRODUCTION
According to the standard model of cosmology, the matter content of the Universe is dominated by dark matter, which interacts with baryons mainly through gravity (Planck Collaboration et al. 2016).Galaxies are believed to form in dark matter halos, which begin as low-mass entities and grow in mass by mergers and accretion of matter (e.g., White & Rees 1978;Springel et al. 2005).The hierarchical growth for any given dark matter halo can be visualized as a "merger tree" (see Figure 1).It is believed that the halo growth history plays an important role in shaping the galaxies they host (Somerville & Davé 2015;Naab & Ostriker 2017), but it is non-trivial to recover the baryonic prop-erties of galaxies solely from merger trees (Jespersen et al. 2022).Much more sophisticated models, such as semi-analytical models (SAMs; Kauffmann et al. 1993;Somerville et al. 2008;Croton et al. 2016) or hydrodynamical simulations (Pillepich et al. 2018;Villaescusa-Navarro et al. 2021;Pakmor et al. 2022) are required, at the cost of several tens of million CPU (central processing unit) hours, which makes many otherwise valuable applications impossible.
For large-scale sky surveys aiming at understanding the constituents of the Universe, having realistic mock galaxy catalogs is essential, as they are required for estimating covariance matrices for constraining cosmological parameters, performing end-to-end validation of the analysis pipeline, as well as estimating the accuracy of photometric redshifts and masses of galaxies, just to name a few (e.g., Breivik et al. 2022).Cosmological surveys are particularly demanding, as tens of thousands of mocks over significant volumes are needed to reduce systematic uncertainties (e.g., Villaescusa-Navarro et al. 2020;Rossi et al. 2021).In order to efficiently generate realistic mock catalogs, many astronomers have resorted to machine learning (ML), and trained models with outputs from numerical simulations that can reproduce key properties of the galaxy populations, such as stellar mass (M ⋆ ), color, and star formation rate (SFR).Previous works (e.g., Kamdar et al. 2016;Agarwal et al. 2018;de Santi et al. 2022;Lovell et al. 2022) have attempted to assign baryonic properties to dark matter halos through various ML algorithms, such as Multi-Layer Perceptrons (MLPs), Random Forests, Extremely Randomized Trees, or a combination of these.However, these works only utilize the dark matter properties at z = 0, and an inefficient encoding of the merger history, leading to suboptimal performance.Recently, Jespersen et al. (2022) have developed a Graph Neural Network (GNN) model, Mangrove, to leverage the full information of merger trees.Jespersen et al. (2022) recovered M ⋆ , cold gas mass and metallicity, and supermassive black hole mass with much higher precision than approaches not taking the merger history explicitly into account.However, the original version of Mangrove only predicts the baryonic properties for a single galaxy per merger tree, and is based on only the Santa Cruz SAM (Somerville et al. 2015), instead of a cosmological hydrodynamical simulation.
In this paper, we put forth a model based on Mangrove that uses all the available information of dark matter halos along the merger trees of a large sample of magnetohydrodynamically simulated galaxies, such that the relevant, baryonic properties of the model galaxies can be predicted to higher accuracy than ever before for all subhalos at z ≲ 5, which has never been achieved before.
This paper is structured as the following.In Section 2, we describe how the data is obtained, the structure of the GNN model, and the loss function utilized.In Section 3, we present the overall performance of the model.Section 3.1 compares the performance of our model with existing studies, while Section 3.2 shows the prediction of different baryonic properties of galaxies.In Section 4, we discuss the caveats, prospects of applications and potential extensions of our model.

DATA PREPARATION AND MODEL
We start by describing the simulation data set used in this work (Section 2.1), detailing our treatment of dark matter halo merger trees (Section 2.2).We then present our GNN model (Section 2.3), describing the arrangement of our edge, node, and global attributes, the loss function as well as the metric used to gauge the model performance.

Simulation and Merger Trees
To train our model, we extract the dark matter and baryonic features from the TNG300, which has a box of 303 Mpc on a side and contains (2500) 3 dark matter particles.The gravitationally bounded subhalos (substructures of halos; one can regard that galaxies all live in subhalos) are identified by the SUBFIND algorithm (Springel et al. 2001).All the properties in this work are derived from individual subhalos and their respective parent halos.We extract the baryonic properties and dark matter properties for each subhalo from the full magnetohydrodynamical (MHD) simulation and the corresponding dark matter-only (DMO) simulation separately.The subhalo-level merger trees are constructed by the Sublink algorithm (Rodriguez-Gomez et al. 2015).Figure 1 depicts an arbitrarily chosen merger tree with a final dark matter subhalo mass 2 of M DM ∼ 10 11 h −1 M ⊙

Data Selection and Augmentation
In this work, we aim to reconstruct the stellar mass, SFR, g − r color, gas mass (M gas ), gas metallicity (Z gas ) and the stellar metallicity (Z ⋆ ) along the whole merger tree.These are key quantities that can be inferred from the observables of galaxies and are therefore important for creating mock catalogs.We limit ourselves to using merger trees that have a final dark matter subhalo mass of M DM ≥ 10 11 h −1 M ⊙ and with a total number of subhalos throughout the tree of between 10 − 2 × 10 4 , but we validate, test, and show the results (Section 3) only for subhalos at a redshift z ≤ 2 (when the age of the universe is ≥ 3.285 Gyr) and with a dark matter mass ≥ 10 11 h −1 M ⊙ .For the dark matter features, we use all subhalo and halo features provided by the TNG300 simulation except for the IDs as well as 3D positions, 3D velocities, and 3D spin.Although the xyz-components spin is defined with respect to the halo and thus formally invariant to changes in the coordinate system, we decided to exclude it to be sure that no spatial information leakage between the training and test set would occur.However, because the total spin J = J 2 x + J 2 y + J 2 z should be spatially uncorrelated, we decide to include this.Since merger trees consist of subhalos (which in a graph can be abstracted as nodes) and the links (which in a graph can be abstracted as edges) between subhalos at different snapshots, we also utilize features for the edges (i.e., mass ratio of a progenitor and a descendent linked by an edge).Lastly, we include global features of the merger trees (e.g., the toal number of progenitors in merger trees).Please refer to Appendix A for the full list of the dark matter features used.
We extract the baryonic properties for each subhalo from the full MHD simulation and map the properties to the subhalos in the merger trees of the DMO simulation using the LHaloTree algorithm (Nelson et al. 2015).Bijective matching of subhalos is achieved by comparing unique dark matter particle IDs, matching the subhalos with the highest fractions of common particles.At z = 0, the matching fraction of the subhalos with M DM ≥ 10 11 h −1 M ⊙ is 100%, while the matching fraction of subhalos in the merger trees used for training, validation, and testing our model throughout cosmic history is 73.14%.During the classification process described in Section 2.3.2,we assume that non-matched subhalos in the DMO simulation do not contain baryonic components and they are therefore omitted in the regression process.

Data Split
To prevent information leakage between the training, validation, and test data sets, we divide the TNG300 box into 8 equal-sized sub-boxes.Out of these sub-boxes, 6 are allocated to the training set, one to the validation set, and one to the test set.We randomly sample the 7231 merger trees in the training set, 1057 in the validation set, and 1031 in the test set.We train the GNN model using different combinations of hyperparameters and verify its performance on the validation set.We select the best-performing model and present the results obtained from that model on the test set in the following sections.The test set is never used for model tuning, and is never used before the final results are prepared.For details regarding the tested hyperparameters, please refer to Appendix B.

Graph Neural Networks
Our main goal is to map the properties of galaxies onto merger trees, which can easily be represented as directed graphs.Thus, it is natural to employ a GNN to extract information from merger trees.Merger trees exhibit intricate structures, yet they can be handled through three fundamental graph components: nodes (V ), edges (E), and global attributes (U ).Nodes are entities with attributes, such as dark matter subhalos with a radius and mass.Edges describe the relation among different subhalos at different snapshots including a merger or inheritance.Edges can have attributes as well, such as the mass ratio of a progenitor and a descendent.The ratio could indicate which progenitor has a larger impact on the descendent.The relation between the baryonic and dark matter properties of a subhalo can be sensitive to the global condition of a merger tree, such as the total number of progenitors, which is then tracked as a global attribute.
Typically, a GNN layer consists of a sequence of message passing, aggregation, and node/edge/global update processes.In our work, a merger tree is provided as an input to the GNN in the form of a graph.Each node aggregates information from its neighboring nodes and updates its own state accordingly.A GNN layer can be divided into three distinct subcomponents: the edge model (ϕ e ), the node model (ϕ v ), and the global model (ϕ u ).In our implementation, all of these submodels consist of two layers of MLPs and include a ReLU activation function between each MLP layer.The edge model takes the features from each pair of connected nodes (v i , v j ∈ R 2Lv ), where L v represents the number of features in each node, along with the edge attributes, to generate a message vector.The message vectors associated with each target node are element-wise aggregated and passed to the node model.The node model then updates the features of the target nodes based on the aggregated messages, their original features, and the global properties of the graph.Finally, the global model aggregates all the updated node features and message vectors to update the global properties of the graph.
Previous works have developed different kinds of GNN layers for different scientific purposes.Among them, Battaglia et al. (2018) developed the MetaLayer, a flexible type of GNN layer that allows for the incorporation of any relational inductive bias.We customize the MetaLayer to study the mapping between dark matter merger trees and galaxies.We introduce a strong inductive bias that assumes merger events are significant in the growth of dark matter subhalos.We implement the inductive bias by rearranging the edge connection as described in Section 2.3.1.In our full model, we utilize four sequential MetaLayers.The model is trained by minimizing the loss described in Section 2.3.2 while we gauge the performance of the model with several different metrics described in Section 2.3.3.A schematic view of the information flow within the GNN during training is provided in Figure 2a.

Rearrangement of Edge Connections
Given that the merger trees span 100 simulation snapshots, a GNN with 100 layers would be required to transmit information from the beginning to the end of the merger trees.However, such an approach would result in over-smoothing, which negatively affects the predictive ability of individual nodes and the GNN as a whole (Oono & Suzuki 2019).To facilitate information transfer and impose an inductive bias of merger events having higher importance, we modify the connection scheme for each subhalo.First, we classify the subhalos into five categories (shown by different shapes of nodes in Figure 1 and Figure 2b): • (1) Progenitors (i.e., subhalos that are the first to be identified on a given branch).
Figure 2b provides a schematic view of the rearrangement process.This reduces the number of required layers while preserving the inductive bias related to the importance of merger events.Nonetheless, there are some subhalos that conform to more than one of the five categories.For the subhalos that belong to both categories (1) and ( 4), (4) takes priority over (1) and will be connected to the post-merger subhalos.For the subhalos that conform to categories (2) and (3), we adopt (2) over (3) and the subhalos will be connected by those in (4).For the subhalos belonging simultaneously to categories ( 2) and ( 4), both the rules applied to (2) and ( 4) take effect individually.The other conformations either do not exist by definition of the categories or are excluded by the tree selection procedure.

Loss function
In this section, we define the loss function used to train our model.It comprises three distinct components: a classification loss to determine the presence of corresponding baryonic features in dark matter subhalos, a regression loss to predict the baryonic features, and L1 and L2 norms (defined in Equation 3) to prevent overfitting.
Throughout different snapshots in the TNG300 dataset, an average of ∼ 70% of the dark matter subhalos do not contain stars, ∼ 80% lack star-formation activity, and ∼ 40% do not contain gas.Thus, our model is trained to simultaneously distinguish whether a subhalo contains stars, gas, and/or star-formation activity and to regress the amount in the case that a subhalo does contain a non-zero amount of the relevant target.We define subhalos containing star and gas particles or are star-forming as positive cases, while those lacking stars/gas or star formation are considered negative.The total loss therefore has a classification component, quantified using a cross-entropy loss, which is commonly used to measure the discrepancy between two discrete probability distributions (Good 2018), where H f,i represents the cross-entropy loss for baryonic target f of subhalo i, while P n and Q n denote the predicted and true probability, respectively, for a subhalo to be classified as class n (i.e., be classified as either having a zero or non-zero amount of stars, gas, or star formation).Our model only consists of two classes (positive or negative), resulting in a summation with only two terms.In Equation 1, p and q respectively represent the predicted and true probabilities for a subhalo to be positive For subhalos classified as containing star particles, we perform regression on the variables M ⋆ , color, and Z ⋆ .For those classified as containing gas particles, we regress the M gas and Z gas .If a subhalo is classified as star-forming, we also regress the SFR.The regression Merger trees exhibit intricate structures, yet they can be handled through three fundamental graph components: nodes (V ), edges (E), and global attributes (U ).The red parts in rows (i), (ii), and (iii) indicate the node, edge, and global features.When a graph is processed through a MetaLayer, three learnable update functions (ϕ e , ϕ v , and ϕ u ) and three aggregation functions (ρ e→v , ρ e→u , and ρ v→u ) are applied to the input graph.The computation proceeds from the edge to the node and, finally, to the global level.Columns (x), (y), and (z) indicate the graph elements that are involved in each of these computations, respectively.The green color indicates which sub-model is being updated in each column, and the yellow color represents the additional elements that are involved in the update.The green arrows connect the three fundamental components (node, edge, and global attributes) and the sequence of functions applied to them.The process shows the following steps: (1) ϕ e is applied edge-wise, which takes the attributes from the upstream nodes (V i ), the downstream nodes (V j ), and the edge itself (E), and returns an updated version of the edge attribute (E ′ ).(2) ρ e→v is applied for each node, which takes all E ′ s that project to the node and outputs E ′ v .
(3) ϕ v is applied node-wise by taking U , V and E ′ v as the input and returning the updated node attributes (V ′ ).( 4) ρ e→u and ρ v→u are applied to all the edges and nodes in the graph, to yield globally summarized edge E ′ u and node (V ′ u ) properties, respectively.(5) Finally, ϕ u is applied to each full graph, taking E ′ u , V ′ u , and U as the input to update the global attribute (U ′ ).Now, we have the updated edge, node and global properties for the graph.(b) A simple merger tree, as described in Section 2.1, to illustrate the connections between different nodes.The red arrows denote the original merger tree, while the yellow arrows depict the rearranged edge connections described in Section 2.3.1.The rearranged edges allow the latest merger events to have added impact on all subhalos following a merger.Yet, by the rearrangement, we introduce a strong inductive bias that assumes merger events are significant in the growth of dark matter subhalos due to the removal of the smooth accretion mode in subhalos' growth (e.g.removing the connection between nodes F, E, and D).
loss is quantified with a Gaussian negative log-likelihood (2) where G represents the Gaussian negative log-likelihood loss, y f,i and y f,i denote the true and predicted values of baryonic feature f for subhalo i, respectively.Additionally, σ f,i 2 is the predicted variance of the baryonic feature f for subhalo i.
Lastly, we include the L1 and L2 norms of all parameters in the GNN in order to regularize the model and prevent overfitting, where ∥N ∥ L1 and ∥N ∥ L2 are the L1 and L2 norms, P i is the ith parameter in the GNN.Combining the losses above, the total loss we use to optimize the GNN is where h, l 1 , and l 2 are the weights assigned to the crossentropy loss, L1 norm and L2 norm, respectively.The loss weights h, l 1 , and l 2 's are found during the hyperparameter search described in Appendix B.

Metrics
To quantify the performance of our model, we introduce five metrics to gauge its prediction.For classification, we use the F1 score, a combined measure of precision and recall.Precision measures the proportion of correctly identified positive subhalos among all subhalos classified as positive (that is, Precision = T P T P +F P , where T P is the number of true positives and F P is the number of false positives), while recall quantifies the model's ability to correctly predict positive subhalos out of all true positive subhalos (i.e., Recall = T P T P +F N , where FN is the number of false negative results).The F1 score is then defined as the harmonic mean of the precision and recall: A higher F1 score indicates a better classifier.The F1 is preferred over precision and recall since it is still a valuable metric for heavily imbalanced datasets.We define four metrics for the regression following Jespersen et al. (2022).The first is the scatter of the prediction residuals, where N is the number of galaxies in the test set, and ∆y i = log(y bar ) − log( y bar ) is the residual of a single prediction for a specific baryonic property in dex.4 ∆y is the average of the residual.The second metric is the bias, defined as Since scatter and bias can be strongly affected by a few outliers, and is also directly included in the Gaussian regression loss, we include two additional metrics.Our third metric is the Pearson correlation coefficient (ρ), which represents the linear correlation between the truth and the prediction of the model: where y i = log(y bar ), y i = log( y bar ), ȳ is the mean of y i , and ȳ is the mean of y i .The last metric is the coefficient of determination (R 2 ), which represents the proportion of the variance in the predicted population ŷ that can be explained by the true population y: A set of predictions all equal to the truth would result in ρ = R 2 = 1.

RESULTS
We first compare the performance of our model with a couple of popular methods used in studying galaxy-halo connection (Section 3.1), then describe in more details of the characteristics of our model prediction, paying special attention to the stellar mass growth history (Section 3.2).

Comparison with Existing Methods
We first compare our results with two other frameworks: an MLP that utilizes all information from the subhalo but lacks explicit merger information, and Abundance Matching (AM, e.g., Kravtsov et al. 2004;Chuang & Lin 2023), which is widely used for connecting subhalo masses with stellar masses.AM can be performed based on subhalo mass, or with other quantities such as the peak maximum circular velocity a subhalo ever attained (V peak ).Here, we compare two AM schemes, V max,90% and ψ 5 , as presented in Chuang & Lin (2023), which are shown to be better tracers of stellar mass and mass-dependent two-point correlation function (2PCF) than V peak and subhalo mass (please refer to the footnotes of Table 1 for the definitions of the two AM schemes).Furthermore, we compare our results with the chaotic uncertainty limit introduced by Genel et al. ( 2019), which represents a theoretical absolute lower limit on the optimal performance achievable by a perfect predictor (an ideal machine).The chaotic limit is found as the scatter between runs of simulations that only differ by an infinitesimal perturbation at z = 5.The results of the three frameworks (MLP plus two AM schemes), along with the chaotic uncertainty limits for the predicted properties at z = 0, are presented in Table 1.The GNN, utilizing the merger trees, outperforms the MLP and AM in all predicted features, except for a negligible difference in the bias, which is always small.However, the improvement from MLP to GNN is only substantial (exceeding 10%) for the scatters of M ⋆ , M gas and Z ⋆ , while the improvements in other properties are relatively small.

Analysis of the Predictions
Figure 3 shows the true and predicted values of all subhalos with M DM ≥ 10 11 h −1 M ⊙ and at z ≤ 2 for both GNN and MLP models.For both models, the scatter in the predictions of M ⋆ and Z gas does not vary significantly with the absolute values of these properties.However, the scatter in the predictions of SFR, Z * and M gas tends to be larger at lower values.The scatter of color tends to be larger for galaxies in between the red sequence and blue cloud (i.e., in the green valley).Comparing the predictions of GNN and MLP, the GNN shows a greater reduction in the scatter of M ⋆ , M gas and Z * predictions, particularly at lower values.Additionally, our GNN reduces the bias of color predictions for the green valley subhalos.However, the improvement in other baryonic features is not substantial.The bottom part of each panel in Figure 3 shows the number of subhalos in the predicted and true baryonic feature bins.Generally, the number distributions predicted by GNN and MLP are consistent with the true distribution for M ⋆ , M gas , Z * and SFR, while those of color and Z gas are less consistent.
To further investigate the scatter in M ⋆ , we show the stellar mass growth history in the main progenitor branches of the lower-mass [log(M ⋆ /M ⊙ ) = 10.0 − 10.5] and higher-mass [log(M ⋆ /M ⊙ ) = 10.5 − 11.5] galaxy populations in Figure 4a.The predicted and the true mass history exhibit excellent consistency in both populations up to z = 5. Figure 4b and Figure 4c show the errors in the stellar mass growth history of the two populations.These panels indicate that there is a small bias in the stellar mass growth history for both populations.While the bias of the lower mass population does not show dependence on redshift, that of the higher mass population becomes slightly larger at low-z.Additionally, the scatter for higher and lower mass populations increases slightly at high-z.Although our model achieves an accurate mapping between the dark matter halo merger history and some relevant baryonic properties of galaxies such as M ⋆ , Z ⋆ , and M gas , the predictions of SFR, color, and Z gas still show a relatively large scatter compared to the true values.This can be attributed to the stochastic nature of the star formation rate, gas metallicity, and color history.As shown in Table 1, these targets also have large chaotic uncertainties, and consistently show very little improvement when including merger history, since the large stochasticity implies that past information is not very informative.
Figure 5 shows the predicted and true histories of star formation, gas metallicity, and color for a few randomly selected subhalos.We have picked three subhalos for each feature, resulting in 9 different subhalos.The panels demonstrate that the GNN can generally capture the overall trends of SFR, color, and gas metallicity histories, but it struggles to recover the stochastic variations and therefore performs similarly to models not including merger histories.However, as discussed by Genel et al. (2019), part of this stochasticity is due to the ways star formation and feedback are implemented in IllustrisTNG, as well as the resolution effects, and thus, the role analogous processes play in nature is hard to pin down.Therefore, even an ideal machine would struggle to reproduce these quantities to a significantly higher fidelity.Regarding the color, the GNN accurately reproduces the subhalos in the red and blue populations; however, it struggles to predict exactly when the transition from the blue cloud to the red sequence occurs.This is consistent with the findings shown in Figure 3b, namely the scatter of subhalos in the green valley is larger compared to those in the red sequence and blue cloud.Nonetheless, for the population without a significant transition from the blue cloud to the red sequence, the stochastic variation is the main obstacle for GNN to predict the color history.
Table 1.Comparison of different methods with four of metrics.The best performance is shown in bold for each metric.We also include the chaotic uncertainty limit, σ0, which was introduced in Section 3.1.σ0 is the best possible regression performance.The Mangrovebased GNN always outperforms the MLP, meaning that including the merger history always improves predictions, although not always significantly.All the results are unbiased.The improvements are similar at both z = 0 and over the period of cosmic noon to the present.The small biases along with a more complete set of metrics can be found in Table 3 in Appendix D. Although our model achieves an accurate mapping between the dark matter halo merger history and some relevant baryonic properties of galaxies such as M⋆, Z⋆, and Mgas, the predictions of SFR, color, and Zgas still show a relatively large scatter compared to the true values.This can be attributed to the stochastic nature of the SFR, color, and Zgas history.To characterize the influence of the stochastic components on different baryonic features, a comparison of the GNN prediction on the true and smoothed baryonic feature history is presented in Table 2 a Defined as (σMLP − σX)/σ0, where X is the method (GNN or AM).
b Abundance matching using ψ5 ≡ V max,90% V max,90%@13.2 , where the parameters with a subscript @ are the normalization factor at a fitted pivot M DM,peak , V 90% is the 90th percentile for the maximum circular velocity (Vmax) throughout the lifetimes of subhalos, and | ṀDM|60% is the absolute subhalo dark matter mass variation rate at 60th percentile.c Abundance matching using V 90% .and c), gas metallicity evolution (panels d, e, and f), and color history (panels g, h and i) from the main primary branch of each merger tree.A Savitzky-Golay filter with a window size of 27 snapshots and an order of 3 is applied to the red curves to generate the black curves, which represent the overall behavior of the history of different stochastic features.We randomly choose a subhalo for each panel to plot their evolution.The panels demonstrate the ability of our GNN model to predict the overall behavior of stochastic baryonic features.However, it struggles to recover the stochastic variations, which might be caused by the IllustrisTNG implementation of the star formation mechanism and other feedback mechanisms, as well as the resolution effects.While the overall history of these properties is vital to understanding galaxy evolution, the actual role stochastic processes play in the Universe is hard to pin down as our model mainly relies on the subgrid physics implemented in the TNG simulations.
We investigate our assumption by assessing the model's performance under abrupt changes in SFR by measuring the derivative of the star formation history (SFH) in the main progenitor branch of the merger trees, similar to the method outlined in Chuang & Lin (2023).We divide the data into two groups based on the magnitude of the derivative of SFH.Specifically, we use a threshold of 10 M ⊙ yr −1 Gyr −1 to distinguish between a group with significant variations and another with smoother variations.Subsequently, we calculate the scatter of the SFR predictions for each group.The scatter and bias for the group with drastic variations are 0.38 and 0.19, respectively, whereas those for the smoother group are 0.33 and 0.001, respectively.These results show that our model performs less accurately when faced with a highly stochastic target.
To further characterize the influence of the stochastic components on different baryonic features, we extract the baryonic feature history along the main primary branch of each merger tree and apply a Savitzky-Golay filter Savitzky & Golay (1964) to remove the stochas-tic component.Then, we gauge the ability of the GNN model to predict the smoothed and unsmoothed baryonic feature history with scatter σ, ρ, and R 2 as presented in Table 2.The smoothed baryonic feature history (black curves) in Figure 5 generally follows the unsmoothed (true) baryonic feature history (red curves).For the color evolution, although the smoothed curve cannot represent the abrupt transition from the blue cloud to the red sequence, it can still trace the overall color history of the subhalos without the transition.For all features, the GNN traces the smoothed history better than that of the stochastic components, indicating the ability of our GNN model to predict the overall behavior of baryonic features while struggling to recover the stochastic variations.Nonetheless, the influences of the stochastic features on baryonic features are different.We demonstrate this point by calculating the relative difference in the scatter between the smoothed history and the GNN prediction based on the unsmoothed history.The difference is larger for SFR, color, and Z gas than for M ⋆ , Z ⋆ , and M gas , which indicates a larger influence of stochastic variation on SFR, color, and Z gas .
The higher accuracy of the predictions for M ⋆ , M gas , and Z ⋆ may imply that, the merger history of a galaxy is more important for determining M ⋆ , M gas , and Z ⋆ than for the other properties investigated in this work.However, as discussed above, this could also mean that we do not yet have enough simulated data to accurately learn the behaviors of a very stochastic target.
As the model continually improves with more data, the next generation of large hydrodynamical simulations will surely be able to improve the ability to accurately learn mappings between merger trees and baryons (Pakmor et al. 2022).There is also the possibility of improving our model with a GNN taking into account the environmental information like the one presented by Wu & Kragh Jespersen (2023).Improving the method we rearrange the merger trees could also improve our model.In Section 2.3, we assume that merger events are significant in the growth of dark matter subhalos, and therefore, we remove some of the original connections among subhalos in the merger trees that correspond to the smooth accretion mode of the subhalos' growth.Nonetheless, because the smooth accretion mode could also affect the evolution of baryonic features, leaving the original connections that represent the smooth accretion mode in the merger trees could also improve the performance of the GNN model.

Potential Applications of the Model
Despite the poorer performance for features with intrinsically large stochasticity, by leveraging the merger history, our model outperforms the state-of-the-art tools for learning the baryonic properties of galaxies and mapping these along the dark matter halo merger trees for all properties.The model works over a wide range of redshifts and works on every branch of all merger trees.Utilizing the model, it is possible to "emulate" the result of MHD simulations if one has a DMO simulation.One application of this is to recover the color-dependent 2PCFs in DMO simulations, given the color prediction from GNN.This provides an alternative way to test various AM schemes by conducting AM separately on blue and red subhalos and see whether the AM scheme can reproduce the color-dependent 2PCFs, such as the test performed in Chuang & Lin (2023).
Another potential application of our model is to apply it to the dark matter subhalos within constrained simulations.Presently, constrained simulations such as those conducted by Wang et al. (2016) and McAlpine et al. (2022), primarily employ DMO simulations, wherein the initial conditions are optimized to replicate the dark matter density fields of the local universe as inferred from observed galaxy distribution.After the optimization, each subhalo in the constrained simulation has one corresponding galaxy in the true universe.With our GNN model serving as an efficient and suitably accurate emulator, it becomes possible to swiftly incorporate baryonic properties into the DMO simulations.This offers the following benefits: • Previously, real galaxies and the subhalos in the MHD simulations have been compared only through statistical properties of certain populations of galaxies and subhalos, such as the fundamental plane (Lu et al. 2020), the star-forming main sequence (Speagle et al. 2014;Donnari et al. 2019), etc. Nonetheless, by comparing the baryonic properties of galaxies estimated by the GNN in the DMO subhalos with the properties of true galaxies, it becomes feasible to establish a oneto-one comparison between the subhalos incorporated with TNG physics learned from the GNN and actual galaxies.
• With the approach outlined in the previous bullet point, we can further compare the GNN-predicted SFH with those derived from spectral energy distribution (SED) fitting in order to calibrate existing SED fitting models, such as the one presented in Abdurro'uf et al. (2021).
• By training the GNN with the merger trees in the constrained simulations and the baryonic properties of true galaxies, it is possible to learn the relation between merger trees and galaxies in the universe.
• Assuming that the last step can be done, it becomes possible to directly optimize the initial conditions of DMO simulations to match the distribution of galaxies instead of relying on the inferred density field.This is achievable because we can estimate whether a subhalo contains galaxies and what baryonic properties the galaxy residing in the subhalo has.
Table 2. Comparison of the GNN prediction on the true and smoothed baryonic feature history with three of the metrics.The metrics subscripted with "Main" and "Smooth" are the metrics calculated with the baryonic feature history from the main primary branch and the smoothed baryonic feature history, respectively.A Savitzky-Golay filter with a window size of 27 snapshots and an order of 3 is applied to the baryonic feature history to remove the stochastic component and generate the smoothed baryonic feature history.The best performance is shown in bold for each metric.For all features, GNN traces the smoothed history better than that with the stochastic components, indicating the ability of our GNN model to predict the overall behavior of baryonic features while struggling to recover the stochastic variations.Generally, for all baryonic features, the larger improvement from σMain to σ Smooth would indicate larger stochasticity in the baryonic feature history.The numerical work was conducted on the highperformance computing facility at the Institute of Astronomy and Astrophysics in Academia Sinica (https://hpc.tiara.sinica.edu.tw).
The IllustrisTNG simulations were undertaken with compute time awarded by the Gauss Centre for Supercomputing (GCS) under GCS Large-Scale Projects GCS-ILLU and GCS-DWAR on the GCS share of the supercomputer Hazel Hen at the High Performance Computing Center Stuttgart (HLRS), as well as on the machines of the Max Planck Computing and Data Facility (MPCDF) in Garching, Germany.The hyper-parameters we searched for GNN include the number of MLP layers {2, 3} to decode the node features from the GNN, the number of MetaLayers {2, 3, 4}, different aggregation methods (sum or mean aggregation), the initial learning rate {0.01, 0.001} for the OneCycleLR policy (Smith & Topin 2017), the relative strengths of classification loss with respect to the regression loss (h in Equation 4, {1.0, 100.0}), and the relative strengths of L1 and L2 norm of all parameters in the GNN with respect to the regression loss (l 1 and l 2 in Equation 4, {0.0, 0.0001}).The best combination of the hyper-parameters is shown in bold.Due to the limitation of the computing resources, we are only able to search a small fraction of the parameter space for each hyper-parameter.We welcome others to test our GNN model in more detail.The specifics of the computational resources used are listed below:  2) and the residual between the prediction and the true value (∆y i in Equation 6) for each baryonic feature.In general, the scatter of all predicted features tends to increase with higher predicted uncertainty, which indicates that the predicted Gaussian uncertainties faithfully reflect the model's ability to predict a given target for a given merger tree.Although the data point densities exhibit an overall axial symmetry along the axis of ∆y i = 0, asymmetric features appear with higher predicted uncertainties for SFR, color, M gas , and Z ⋆ .Another noticeable feature in Figure 6 is the bimodality in the uncertainties for SFR and Z ⋆ .
The asymmetric features could be attributed to the overestimation of values with high stochasticities.As described in Section 4, we divide the data into two groups based on the magnitude of the derivative of the SFH.We then measure the mean uncertainty of the SFR for the two groups.The mean uncertainty for the group with drastic variations is 1.89, whereas that for the smoother group is 1.43.These results show that our model predicts a higher uncertainty for targets with higher stochasticity, while Section 4 indicates that bias is also more significant with higher stochasticity.Thus, the targets with higher uncertainty show a positive residual in Figure 6b.
Since the model is minimized with a Gaussian loss, we expect that the pull/z-score (z ≡ ∆y/σ) should show a Gaussian distribution, while the χ 2 N = χ 2 /N should be close to 1. Figure 7 shows the distribution of z-scores (the pull plot) and a unit Gaussian.Similar to Jespersen et al. (2022), the z-score distributions in Figure 7 can be approximated by a unit Gaussian, indicating that error estimation is highly accurate.Figure 6.Each panel shows the predicted uncertainty and the residual between the prediction and the true value for a specific baryonic feature (indicated at the bottom of each panel).The data point densities are normalized such that the maximum value is one.The y-axis in each panel is logarithmically scaled.In general, the model successfully assigns large uncertainties to targets that are likely to have large residuals.Although the data point densities exhibit an overall axial symmetry along the axis of ∆yi = 0, asymmetric features appear with higher predicted uncertainties for SFR, color, Mgas, and Z⋆.The asymmetric features could be due to the overestimation of values with high stochasticities, which results in higher predicted uncertainty.N for each feature are displayed in the upper-left of each panel.In general, although asymmetric features appear in Figure 6, the overall distributions of z-scores can still be approximated by a Gaussian.

D. FULL TABLE OF METRICS
3 Figure 1.(a) An arbitrarily chosen merger tree depicting a subhalo with a final subhalo mass of MDM ∼ 10 11 M⊙ (with M⋆ ∼ 10 10 M⊙).The x-axis represents the age of the universe at each snapshot, while the color indicates the dark matter halo mass.The nodes are categorized into 5 groups (to be described in Section 2.3.1 and Figure 2b), which correspond to different shapes in the figure.(b) A comparison of true (red data points) and predicted (blue data points) logarithmic stellar masses (shown on the z-axis) in a sample merger tree with a final stellar mass of ∼ 10 11 M⊙.The color in the merger tree on the x − y plane indicates the prediction error, measured in dex.Our work mainly focuses on the subhalos with MDM ≥ 10 11 h −1 M⊙, and thus, the other subhalos are omitted.

Figure 2 .
Figure2.(a) A schematic view of the GNN workflow in a single MetaLayer.Merger trees exhibit intricate structures, yet they can be handled through three fundamental graph components: nodes (V ), edges (E), and global attributes (U ).The red parts in rows (i), (ii), and (iii) indicate the node, edge, and global features.When a graph is processed through a MetaLayer, three learnable update functions (ϕ e , ϕ v , and ϕ u ) and three aggregation functions (ρ e→v , ρ e→u , and ρ v→u ) are applied to the input graph.The computation proceeds from the edge to the node and, finally, to the global level.Columns (x), (y), and (z) indicate the graph elements that are involved in each of these computations, respectively.The green color indicates which sub-model is being updated in each column, and the yellow color represents the additional elements that are involved in the update.The green arrows connect the three fundamental components (node, edge, and global attributes) and the sequence of functions applied to them.The process shows the following steps: (1) ϕ e is applied edge-wise, which takes the attributes from the upstream nodes (V i ), the downstream nodes (V j ), and the edge itself (E), and returns an updated version of the edge attribute (E ′ ).(2) ρ e→v is applied for each node, which takes all E ′ s that project to the node and outputs E ′ v .(3)ϕ v is applied node-wise by taking U , V and E ′ v as the input and returning the updated node attributes (V ′ ).(4) ρ e→u and ρ v→u are applied to all the edges and nodes in the graph, to yield globally summarized edge E ′ u and node (V ′ u ) properties, respectively.(5) Finally, ϕ u is applied to each full graph, taking E ′ u , V ′ u , and U as the input to update the global attribute (U ′ ).Now, we have the updated edge, node and global properties for the graph.(b) A simple merger tree, as described in Section 2.1, to illustrate the connections between different nodes.The red arrows denote the original merger tree, while the yellow arrows depict the rearranged edge connections described in Section 2.3.1.The rearranged edges allow the latest merger events to have added impact on all subhalos following a merger.Yet, by the rearrangement, we introduce a strong inductive bias that assumes merger events are significant in the growth of dark matter subhalos due to the removal of the smooth accretion mode in subhalos' growth (e.g.removing the connection between nodes F, E, and D).
Assessing and Understanding the Performance of our Model

Figure 3 .
Figure 3.The upper part of each panel shows the comparison of predicted and true values of (a) stellar mass, (b) color (g − r), (c) stellar metallicity, (d) gas mass, (e) SFR, and (f) gas metallicity.The color scale indicates the density of the GNN-predicted data points, normalized such that the maximum value is one.The dashed and solid contours represent the MLP-predicted and GNN-predicted baryonic features, respectively.The improvement from an MLP to our GNN is always visible.The lower part of each panel displays the number of subhalos in the predicted and true baryonic feature bins, also normalized such that the maximum value is one for the true distribution.The solid blue lines indicate the true distribution, the solid orange lines represent the GNN-predicted distribution, and the dashed green lines correspond to the MLP-predicted distribution.The distributions mostly follow each other, but both ML methods encounter difficulties when a bimodality exists in the distribution.

Figure 4 .
Figure 4. Panel (a): The stellar mass growth history in the primary branches of subhalos, for two stellar mass bins chosen at z = 0 as indicated in the inset.The y-axis shows the stellar mass in the log-scale.The solid lines represent the true values, while the dash-dotted lines correspond to the GNN-predicted 16th, 50th, and 80th percentiles for the two mass bins.Panels (b) and (c): A detailed view of the difference between the true and the GNN-predicted stellar mass growth history.The color scale indicates the density of data points in different mass bins (indicated at the top of each panel), normalized such that the maximum value is one.The red lines indicate the 16th and 84th percentiles of the error at each redshift, while the blue ones represent the median error.

Figure 5 .
Figure5.The predicted (blue), true (red), and smoothed (black) value of star formation history (panels a, b, and c), gas metallicity evolution (panels d, e, and f), and color history (panels g, h and i) from the main primary branch of each merger tree.A Savitzky-Golay filter with a window size of 27 snapshots and an order of 3 is applied to the red curves to generate the black curves, which represent the overall behavior of the history of different stochastic features.We randomly choose a subhalo for each panel to plot their evolution.The panels demonstrate the ability of our GNN model to predict the overall behavior of stochastic baryonic features.However, it struggles to recover the stochastic variations, which might be caused by the IllustrisTNG implementation of the star formation mechanism and other feedback mechanisms, as well as the resolution effects.While the overall history of these properties is vital to understanding galaxy evolution, the actual role stochastic processes play in the Universe is hard to pin down as our model mainly relies on the subgrid physics implemented in the TNG simulations.
Figure6shows the predicted uncertainty ( σ in Equation2) and the residual between the prediction and the true value (∆y i in Equation6) for each baryonic feature.In general, the scatter of all predicted features tends to increase with higher predicted uncertainty, which indicates that the predicted Gaussian uncertainties faithfully reflect the model's ability to predict a given target for a given merger tree.Although the data point densities exhibit an overall axial symmetry along the axis of ∆y i = 0, asymmetric features appear with higher predicted uncertainties for SFR, color, M gas , and Z ⋆ .Another noticeable feature in Figure6is the bimodality in the uncertainties for SFR and Z ⋆ .The asymmetric features could be attributed to the overestimation of values with high stochasticities.As described in Section 4, we divide the data into two groups based on the magnitude of the derivative of the SFH.We then measure the mean uncertainty of the SFR for the two groups.The mean uncertainty for the group with drastic variations is 1.89, whereas that for the smoother group is 1.43.These results show that our model predicts a higher uncertainty for targets with higher stochasticity, while Section 4 indicates that bias is also more significant with higher stochasticity.Thus, the targets with higher uncertainty show a positive residual in Figure6b.Since the model is minimized with a Gaussian loss, we expect that the pull/z-score (z ≡ ∆y/σ) should show a Gaussian distribution, while the χ 2 N = χ 2 /N should be close to 1. Figure7shows the distribution of z-scores (the pull plot) and a unit Gaussian.Similar toJespersen et al. (2022), the z-score distributions in Figure7can be approximated by a unit Gaussian, indicating that error estimation is highly accurate.

Figure 7 .
Figure 7.Each panel shows the distribution of pulls/z-scores (z-scorei = ∆yi/ σi, shown in blue), and a unit Gaussian (shown in red) for reference.The mean, variance, and χ 2N for each feature are displayed in the upper-left of each panel.In general, although asymmetric features appear in Figure6, the overall distributions of z-scores can still be approximated by a Gaussian. .

Table 3 .
The full table of metrics used to quantify the performance of GNN, MLP, and AM.Please refer to Table1for the definition.cPleaserefer to Table1for the definition.