The Effect of Inefficient Accretion on Planetary Differentiation

Pairwise collisions between terrestrial embryos are the dominant means of accretion during the last stage of planet formation. Hence, their realistic treatment in N-body studies is critical to accurately model the formation of terrestrial planets and to develop interpretations of telescopic and spacecraft observations. In this work, we compare the effects of two collision prescriptions on the core-mantle differentiation of terrestrial planets: a model in which collisions are always completely accretionary (``perfect merging'') and a more realistic model based on neural networks that has been trained on hydrodynamical simulations of giant impacts. The latter model is able to predict the loss of mass due to imperfect accretion and the evolution of non-accreted projectiles in hit-and-run collisions. We find that the results of the neural-network model feature a wider range of final core mass fractions and metal-silicate equilibration pressures, temperatures, and oxygen fugacities than the assumption of perfect merging. When used to model collisions in N-body studies of terrestrial planet formation, the two models provide similar answers for planets more massive than 0.1 Earth's masses. For less massive final bodies, however, the inefficient-accretion model predicts a higher degree of compositional diversity. This phenomenon is not reflected in planet formation models of the solar system that use perfect merging to determine collisional outcomes. Our findings confirm the role of giant impacts as important drivers of planetary diversity and encourage a realistic implementation of inefficient accretion in future accretion studies.


INTRODUCTION
Collisions between similar-size planetary bodies ("giant impacts") dominate the final stage of planet formation (Wetherill 1985;Asphaug 2010). These events generally result in the formation of transient magma oceans on the resulting bodies (Tonks & Melosh 1993;de Vries et al. 2016). During the last decade, a series of studies combined core-mantle differentiation with accretion modeling to put constraints on how terrestrial planets' cores formed in the solar system and showed that core formation of terrestrial planets does not occur in a single stage, but rather that it is the result of a multistage process, i.e., a series of metal-silicate equilibrations (e.g., Rubie et al. 2011Dwyer et al. 2015;Bonsor et al. 2015;Carter et al. 2015;Rubie et al. 2016;Fischer et al. 2017;Zube et al. 2019). In particular, the core formation model from  uses rigorous chemical mass balance with metal-silicate element partitioning data and requires assumptions regarding the bulk compositions of all starting embryos and planetesimals as a function of heliocentric distance. The differentiation of terrestrial planets is modeled as the separation of the iron-rich metal from silicate material using metalsilicate partitioning of elements to determine the evolving chemistry of the two reservoirs. New insights into terrestrial planet formation have been enabled by this equilibration model. For example,  demonstrated that Earth likely accreted from heterogeneous reservoirs of early solar system materials, Rubie et al. (2016) demonstrated that iron sulfide segregation was responsible for stripping the highly siderophile elements from Earth's mantle, and Jacobson et al. (2017) proposed that Venus does not have an active planetary dynamo because it never experienced late Moon-forming giant impacts.
These previous studies of planet differentiation interpreted the results of N -body simulations of terrestrial planet formation where collisions were treated as perfectly inelastic, which computer models of giant impacts have shown is an oversimplification of the complex collision process (e.g., Agnor & Asphaug 2004;Stewart & Leinhardt 2012). Inelastic collisions, or "perfect merging", assumes the projectile mass (M P ) merges with the target mass (M T ) to form a body with mass M T + M P . However, in nearly all giant impacts, escaping debris is produced, and the projectile's core does not simply descend through the magma ocean and merge with the target's metal core. Instead, half of all collisions are 'hitand-run', where the projectile escapes accretion (Agnor & Asphaug 2004;Kokubo & Genda 2010) and may never re-impact the target again (Chambers 2013;Emsenhuber et al. 2020). In these events, partial accretion may occur between the metal and silicate reservoirs of the target body and the projectile (or 'runner'). To accurately model the geochemical evolution of the mantles and cores of growing planetary bodies, it is thus necessary to account for the range of accretionary (or nonaccretionary) outcomes of giant impacts.

Beyond perfect merging
High-resolution hydrocode computer simulations of collisions provide a description of the outcomes of giant impacts, which can then be incorporated into planet formation models to produce higher-fidelity predictions. But each giant impact simulation requires a long computational time to complete (on the order of hours to days depending on the resolution and computing resources). Since a large number of collisions may occur during late-stage terrestrial planet formation (up to order of 10 3 , e.g., Emsenhuber et al. 2020), it is impractical to model each impact "on-the-fly" by running a full hydrocode simulation at a resolution that is sufficient to make meaningful predictions.
Several previous studies focused on overcoming both the assumption of perfect merging and the aforementioned computational bottleneck. These works employed various techniques to resolve different aspects of giant impacts. Commonly, scaling laws and other algebraic relationships (e.g. Holsapple & Housen 1986;Housen & Holsapple 1990;Holsapple 1994;Kokubo & Genda 2010;Leinhardt & Stewart 2012;Genda et al. 2017) are utilized during an N -body (orbital dynamical) planet formation simulation to predict the collision outcome for the masses of post-impact remnants (e.g. Chambers 2013;Quintana et al. 2016;Clement et al. 2019). The post-impact information is then fed back into the N -body code for further dynamical evolution. Other studies 'handed off' each collision scenario to a hydrodynamic simulation in order to model the exact impact scenario explicitly, and fed post-impact information back to the N -body code (Genda et al. 2017;Burger et al. 2020). The latter methodology is the most rigourous, but also the most computationally demanding, as it requires to run a hydrodynamic calculation for every one of the hundreds of collisions in an N -body simulation, each of which requires days of computer time when using modest resolution.
Alternatively, Cambioni et al. (2019, termed C19 hereafter) proposed a fully data-driven approach, in which machine-learning algorithms were trained on a data set of pre-existing hydrocode simulations (Reufer 2011;Gabriel et al. 2020). The machine-learned functions ("surrogate models") predict the outcome of a collision within a known level of accuracy with respect to the hydrocode simulations in an independent testing set. This process is fully data-driven and does not introduce model assumptions in the fitting, which is in contrast to scaling laws composed of a set of algebraic functions based on physical arguments (e.g., Leinhardt & Stewart 2012;Gabriel et al. 2020). The surrogate models are fast predictors and Emsenhuber et al. (2020, termed E20 hereafter) implemented them in a code library named collresolve 1  to realistically treat collisions on-the-fly during terrestrial planet formation studies. When collresolve is used to treat collisions in N -body studies, the final planets feature a wider range of masses and degree of mixing across feeding zones in the disk compared with those predicted by assuming perfect merging. Although E20 ignored debris re-accretion -and we use these dynamical simulations for the study herein-their results suggest that composition diversity increases in collision remnants. This is something that cannot be predicted by models that assume perfect merging.

This work
In this paper, we compare the collision outcome obtained assuming perfect merging with that predicted by the more realistic machine-learned giant impact model of C19 and E20. In the former case, debris is not pro-duced by definition and in the latter case, debris is produced but not re-accreted. In this respect, our goal is not to reproduce the solar system terrestrial planets, but to investigate whether or not the two collision models produce different predictions in terms of terrestrial planets' core-mantle differentiation at the end of the planetary system's dynamical evolution.
C19 and E20 developed models for the mass and orbits of the largest post-impact remnants; here we go a step further and develop a model for the preferential erosion of mantle silicates and core materials. To do so, we train two new neural networks to predict the core mass fraction of the resulting bodies of a giant impact. We describe the data-driven model of inefficient accretion by C19 and E20 in Section 2 and its implementation in the core-mantle differentiation model by  in Section 3.
We compare the perfect merging and inefficientaccretion models in two ways: (1) by studying the case of a single collision between two planetary embryos (Section 4); and (2) by interpreting the effect of multiple giant impacts in the N -body simulations of accretion presented in E20 (Section 5). During the accretion of planets through giant impacts between planetary embryos, we focus on the evolution of those variables that control planetary differentiation: mass, core mass fraction, as well as metal-silicate equilibration pressure, temperature, and oxygen fugacity. Other factors that may alter composition and thermodynamical evolution indirectly, e.g., atmospheric escape and radiative effects, are not covered in these models.

INEFFICIENT-ACCRETION MODEL
The data-driven inefficient-accretion model by C19 and E20 consists of applying machine learning to the prediction of giant impacts' outcomes based on the preexisting set of collision simulations described below in Section 2.1. By training on a large data set of simulations of giant impacts, this approach allows producing response functions (surrogate models) that accurately and quickly predict key outcomes of giant impacts needed to introduce realistic collision outcomes "on-thefly" of an N -body code.

Data set of giant impact simulations
The data set used in C19 and E20 and in this work is composed of nearly 800 simulations of planetary collisions performed using the Smoothed-Particle Hydrodynamics (SPH) technique (see, e.g., Monaghan 1992; Rosswog 2009, for reviews) obtained by Reufer (2011) and further described in Gabriel et al. (2020). They have a resolution of ∼ 2 × 10 5 SPH particles. All bodies are differentiated with a bulk composition of 70 wt% silicate and 30 wt% metallic iron, where the equation of state for iron is ANEOS (Thompson & Lauson 1972) and M-ANEOS for SiO 2 (Melosh 2007). The data set spans target masses M T from 10 −2 to 1 M ⊕ , projectileto-target mass ratios γ = M P /M T between 0.2 and 0.7, all impact angles θ coll , and impact velocities v coll between 1 and 4 times the mutual escape velocity v esc , where which represents the entire range of expected impact velocities between major bodies from N -body models (e.g. Chambers 2013; Quintana et al. 2016). In Equation 1, G is the gravitational constant, and R T and R P are the bodies' radii. We refer to C19, E20, and Gabriel et al. (2020) for more information about the data set. An excerpt of the data set is reported in Table 1. The data set is provided in its entirety in the machine-readable format. Designing the surrogate models described in the following Sections requires running hundreds of SPH simulations and training the machine-learning functions. The computational cost of this procedure, however, is low when compared against the computational resources necessary to solve each collision in a N -body study with a full SPH simulation at the same particle resolution of the simulations used in C19 and E20 for each event instead of using the surrogate models. Each giant impact simulation requires a long computational time to complete, on the order of hours to days depending on the resolution and computing resources, while the surrogate models, once constructed, provide an answer in a fraction of a second (C19; E20).

Surrogate model of accretion efficiencies
In order to assess the accretion efficiency of the target across simulations of varying masses of targets and projectiles, E20 normalize the change in mass of the largest remnant, assumed to be the post-impact target, by the projectile mass (Asphaug 2010): where M L is the mass of the largest single gravitationally bound remnant. Accretion onto the target causes ξ L > 0, while negative values indicate erosion. This accretion efficiency is heavily dependent on the impact velocity relative to the mutual escape velocity v esc , especially in the critical range ∼ 1-1.4 v esc , which encompasses ∼90% of the probability distribution of impact velocities between major remnants in the N -body simulations by  Chambers (2013) which include collision fragmentation (Gabriel et al. 2020).
Similarly, for the second largest remnant with mass M S , assuming it is the post-impact projectile, i.e., the runner in a hit-and-run collision (Asphaug et al. 2006), E20 define a non-dimensional accretion efficiency again normalized by the projectile mass: The value ξ S is almost always negative, as mass transfer from the projectile onto the target occurs also in case of projectile survival (Asphaug et al. 2006;, and loss to debris can occur. The mass of the debris M D is computed from mass conservation. If the debris creation efficiency is defined as: ξ D = M D /M P , then ξ L + ξ S + ξ D = 0. In E20, the quantities of Equations 2 and 3 were used to train a surrogate model of accretion efficiencies. This is a neural network, that is, a parametric function trained to mimic the "parent" SPH calculation as an input-output function, in order to predict real-variable outputs given the four impact parameters (predictors): mass of the target, projectile-to-target mass ratio, impact angle, and impact velocity. The data set entries are of the type: The surrogate model is assessed in its training success and predictive capabilities by means of the mean squared error and correlation coefficient for each quantity Q, where Q NN and Q SPH indicate the predictions by the Neural Networks and the correspondent outcome from the SPH simulations with standard deviations σ NN and σ SPH , respectively. The goal is to achieve a mean squared error as close to zero and a correlation coefficient as close to 100% as possible on a testing data set, which comprises data that were not used for training. The surrogate model of accretion efficiency is able to predict the mass of the largest and second largest remnants with a mean squared error at testing equal to 0.03 and a correlation coefficient greater than 96%. Importantly, although the surrogate model has a high global accuracy, inaccurate predictions can still occur locally in the parameter space (C19). These inaccurate predictions, however, are not systematic; the distributions of the residuals ∆ξ between accretion efficiency predictions Q N N and target values Q SP H for the largest remnant and second remnant are well fit by Gaussian distributions N (µ, σ) = N (0.0, 0.1) and N (0.02, 0.09), respectively, where µ is the mean of the residuals and σ is the standard deviation of the residuals, so that N (0, 0) is the distribution associated with a noiseless surrogate model. The noise level is comparable to the numerical precision of the SPH simulations (∼ 0.1-0.15 in units of accretion efficiency). High-inaccuracy predictions (i.e., |∆ξ| > 0.5) account for just 3.7% and 2.6% of the overall set for largest and second remnants respectively, and tend to cluster in proximity of the boundaries between different regimes, as previously discussed in C19.

Classifier of collision types
In E20, the data set of SPH simulations of Section 2.1 was also used to train a classifier which provides predictions of the type of collisions (classes, or responses) based on the following mass criterion: accretion (M L > M T and M S < 0.1M P ), erosion (M L < M T and M S < 0.1M P ), and hit-and-run collision (M S > 0.1M P ). The data set entries are of the type: As opposed to the prediction by the surrogate model of accretion efficiency described in Section 2.2, the prediction of the classifier is categorical in type and its accuracy is computed as the mean value of the correct classifications over the whole population at testing. This accuracy is equal to 95% globally, 83.3% on the "erosion" type, 91.7% for the "accretion" type, and 98.0% for the hit-and-run collision type.

Surrogate models of core mass fraction
We use the same SPH data set as C19 and E20 to train two new surrogate models to predict the core mass fractions of the largest and second largest remnants. Each remnant's core mass fraction is obtained by accounting for all material in the SPH simulations. This includes all gravitationally-bound material such as potential silicate vapour resulting from energetic collisions involving larger bodies or impact velocities. The core mass fractions of the target and projectile are termed Z T and Z P , respectively. Their initial values are always equal to 30% in the dataset of SPH simulations used here.
For the first new surrogate model, we train, validate, and test a neural network using a data set with entries: where Z L is the largest remnant's core mass fraction. In the hit-and-run regime only, we train a second neural network to predict the core mass fraction of the second largest remnant. For this surrogate model, the data set has entries: where Z S is the post-collision core mass fraction of the second largest remnant. Following the approach described in C19 and E20, the training of the networks is performed on 70% of the overall data set. The rest of the data are split between a validation set (15%) and a testing set (15%) via random sampling without replacement. Each neural network architecture consists of an input layer with 4 nodes (as many as the impact properties), one or more hidden layers and an output node. The number of hidden layers, number of neurons in each hidden layer, and the neurons activation functions (that is, the functions that define the output of the neurons given a set of inputs) are among the "hyperparameters" of the network. The optimal hyperparameters are not learned during training, but found through minimization of the mean squared error on the validation set (Equation 5). Additional hyperparameters include the choice of the training algorithm, the intensity of the regularization of the training cost function aimed to avoid the choice for too "complex" models (e.g., Girosi et al. 1995) and the strategy of data normalization before training. The optimal neural network architectures have 10 neurons in the hidden layer with an hyperbolic tangent sigmoid activation function (Equation 13 in C19). The inputs and targets are normalized in the range [-1, 1]. The regularization process (e.g., Girosi et al. 1995) has a strength equal to 5.48 × 10 −6 and 3.80 × 10 −5 for the Z L and Z S networks, respectively.
Each network is trained using the Levenberg-Marquardt algorithm described in Hagan et al. (1997). The learning dynamics (i.e., evolution of the mean squared error for training, validation, and testing at different epochs of training procedure) are plotted in Figure 1 for the largest and second largest remnants (left and right panels, respectively). At every training epoch, the weights of the networks are updated such that the mean squared error on the training data set gets progressively smaller. The mean squared errors converge in about 6 epochs for the surrogate model of the largest remnant and 4 for that of the second remnant.
Once trained, the predictive performance of the networks are quantified by the mean squared error (red curves in Figure 1, Equation 5) and correlation coefficient (box plots, Equation 6) on the testing dataset. The mean squared error at convergence is equal to about 2 × 10 −4 with a correlation coefficient of above 96% for the largest remnant, and 8 × 10 −4 with a correlation coefficient above 93% for the second largest remnant.
In Figure 1, the differences between training, validation, and testing mean squared errors at convergence are smaller than 0.01 in units of M core /M planet . The mean squared errors on the testing set are smaller or equal to those on the training set, indicating that the trained algorithms generalize well the prediction to new cases not learned during training. The validation mean squared errors are also smaller than the training errors, which indicates that the trained algorithms are not overfitting the training datasets. These (small) differences between mean squared errors may also reflect differences in variance of the datasets, which we attempted to mitigate by populating the sets using random sampling without replacement.
Although the mean squared errors are globally low, inaccurate predictions may still occur locally in the parameter space. The distributions of the residuals in core mass fraction between Q N N and target values Q SP H for largest remnant and second remnant values, however, are well fit by Normal distributions N (0%, 2%) and N (0%, 1%), respectively. Despite the penury of data at large core mass fractions (i.e., Z > 40%), the surrogate model is found to be accurate in this regime too: the residuals are well fit with N (3%, 4%) and N (−3%, 6%) for the largest and second largest remnants, respectively. This noise level is comparable to the numerical uncertainty of the SPH simulations in predicting the core mass fraction of terrestrial planets, which we estimate to be 2-5 % for accretionary and erosive events, and in the order of 10% for erosive hit-and-run events. Predictions with residual > 10% account for just the 3.5% and 7.8% of the overall set for largest and second remnants respectively. As expected, we find that these inaccurate predictions tend to cluster in proximity of the boundary between hit-and-run and erosive collisions (i.e., disruptive hit-and-runs, Asphaug & Reufer 2014), in which substantial debris is produced in the erosion of both the targets' and projectiles' mantle, and where the SPH simulator is also expected to be the most noisy.

Accretion efficiencies of the core and the mantle
In order to track how the core mass fraction of a growing planet evolves through the giant impact phase of accretion, we split the change in mass from the target to the largest remnant into a core and mantle component: T are the changes in mass of the core and mantle of the largest remnant from the target as indicated by the superscripts "c" and "m", respectively. Similarly, in the hit-and-run collision regime, we do the same for the change in mass from the projectile to the second largest remnant: where ∆M c S = M c S − M c P and ∆M m S = M m S − M m P are defined similarly as the largest remnant case.
The remnant bodies after a giant impact may have different core mass fractions than either of the pre-impact bodies (i.e. target and projectile) or each other. For instance, a projectile may erode mantle material from the target in a hit-and-run collision, but during the same impact the target may accrete core material from the projectile. In order to quantify and study these possibilities, we define distinct accretion efficiencies for each remnants' core (ξ c L and ξ c S ) and mantle (ξ m L and ξ m S ), so that, after dividing through by the projectile mass for normalization, the above expressions are transformed into: where we define the core and mantle component accretion efficiencies in the same manner as the overall accretion efficiencies of the remnant bodies: where on the right-hand sides of Equations 14-17, we express the core and mantle accretion efficiencies in terms found in Table 1: the initial projectile-to-target mass ratio (γ), the overall accretion efficiencies of the remnants (ξ L and ξ S ), and the core mass fractions of the final bodies (Z L and Z S ).
In the SPH simulations discussed in Section 2.1, the core mass fractions of the initial bodies are always Z T = Z P = 30%. The approach adopted for cases of collisions between bodies with core mass fraction different from 30% -which are expected to occur in planet formation studies -is discussed in Section 5.2.

PLANETARY DIFFERENTIATION MODEL
After a giant impact, the surviving mantle of a remnant body is assumed to equilibrate with any accreted material, in a magma ocean produced by the energetic impact. This equilibration establishes the composition of the cooling magma ocean and also potentially involves dense Fe-rich metallic liquids that segregate to the core due to density differences. In this Section, we describe how the inefficient-accretion model of Section 2 is implemented into the planetary accretion and differentiation model published by Rubie et al. (2011Rubie et al. ( , 2016. We direct the reader to the manuscripts by Rubie et al. for a more detailed description of the metal-silicate equilibration approach itself.

Identification of silicate and metallic reservoirs
Regardless of the impact conditions, in the perfectmerging model the metallic core of the projectile plunges into the target's magma ocean turbulently entraining silicate liquid in a descending plume (Deguen et al. 2011;Rubie et al. 2003). In the inefficient-accretion model, if the collision is a hit-and-run or erosive, the metallic core of the projectile does not plunge into the target's magma ocean, but some of the collision energy may be still delivered to the core-mantle boundaries of the colliding pair, potentially inducing some mixing of the metal and silicate reservoirs.
As today, however, there are no clear recipes for the quantification of the degree of mixing at the core-mantle boundary and re-equilibration of such mixture in case of hit-and-run and erosive collisions. Here, we explore two end-member scenarios: (1) no mixing between the reservoirs and, hence, no silicate-metal re-equilibration; and (2) the metal and silicate reservoirs are fully mixed and the mixture re-equilibrates. It is important to note that these two end-member scenarios are both unlikely, especially the second one, and that the reality of the degree of mixing and equilibration lies in-between these two end-members. In particular, previous studies do not give much support for the idea of full re-equilibration (e.g., Nakajima & Stevenson 2016;Carter et al. 2015), as this would require the delivery of a large input of energy at the core-mantle boundary. We therefore use these two end-members to bound the problem for the sake of studying the effect of switching on/off re-equilibration on planetary differentiation.
In the flowchart of Figure 2, we outline the steps for identifying equilibrating silicate and metal reservoirs from surviving target and accreted projectile material in the remnants' mantle. These steps are: 1. The classifier of collision type (Section 2.3) is used to determine the number of resulting bodies of a collision. In case there is only a single remnant (accretion or erosion regimes), the core of the projectile may be either accreted or obliterated. This is determined by looking at the sign of the largest remnant's core accretion efficiency ξ c L .
2a. If ξ c L is positive and the event is not an hit-and-run collision, the metallic core of the projectile plunges into the target's magma ocean turbulently entraining silicate liquid in a descending plume (Deguen et al. 2011;Rubie et al. 2003). The plume's silicate content increases as the plume expands with increasing depth. This determines the volume fraction of metal φ met : where α = 0.25 (Deguen et al. 2011), r 0 is the initial radius of the projectile's core and z is depth in the magma ocean. Equation 18 allows estimating the mass fraction of the embryo's mantle material that is entrained as silicate liquid since the volume of the descending projectile core material is known. Following chemical equilibration during descent, any resulting metal is added to the proto-core and the equilibrated silicate is mixed with the fraction of the mantle that did not equilibrate to produce a compositionally homogeneous mantle , under the assumption of vigorous mixing due to mantle convection.
2b. If ξ c L is negative, the target's core and mantle are eroded and the projectile is obliterated. For this case, we explore the two assumptions of no reequilibration and full re-equilibration between the silicate and metal reservoirs of the largest remnant.
3. For hit-and-run collisions, the projectile survives accretion and becomes the second remnant of the collision. There may be mass transfer between the colliding bodies as predicted using the surrogate models of Section 2.2 and Section 2.4.For this case, we explore the two assumptions of no re-equilibration and full-equilibration between the silicate and metal reservoirs of the two resulting bodies.

Mass balance
After each accretionary giant impact and in case of full-equilibration for hit-and-run and erosive collisions, there is re-equilibration between any interacting silicaterich and metallic phases in the remaining bodies, which ultimately determines the compositions of the mantle and core. The volume of interacting material is determined as described in Section 3.1 and shown schematically in Figure 2. The planetary differentiation model tracks the partitioning of elements in the post-impact magma ocean between a silicate phase, which is modeled as a silicate-rich phase mainly composed of SiO 2 , Al 2 O 3 , MgO, CaO, FeO, and NiO, and a metallic phase, which is modeled as a metal reservoir mainly composed of Fe, Si, Ni, and O. Both the silicate-rich and metal phases include the minor and trace elements: Na, Co, Nb, Ta, V, Cr, Pt, Pd, Ru, Ir, W, Mo, S, C, and H.
The identified silicate-rich and metal phases equilibrate as described by the following mass balance equation: Silicate Liquid (1) + Metal Liquid (1) =⇒ Silicate Liquid (2) + Metal Liquid (2) where the flag "1" indicates the system before equilibration and the flag "2" indicates the system which is equilibrated under the new thermodynamic conditions of the resulting bodies of a collision. Thus, all equilibrating atoms are conserved and the oxygen fugacity of the reaction is set implicitly. The mass balance equations for the major elements (Si, Al, Mg, Ca, Fe, Ni, and O) are iteratively solved by combining them with experimentallyderived expressions for their metal-silicate partitioning behavior. We use the predictive parameterizations by Mann et al. (2009) for Si and most other siderophile elements, Kegler et al. (2008) for Ni and Co and Frost et al. (2010) for O. For a detailed description of this coupled metal-silicate partitioning and mass balance approach, see Rubie et al. (2011).
In the chemical equilibrium of Equation 19, the partitioning of the elements into the core and mantle is controlled by the three parameters of the model: pressure P e , temperature T e , and oxygen fugacity f O2 of metalsilicate equilibration. These are in turn a function of the type of collision and its accretion efficiencies (Section 2.5), which control how the reservoirs of the target and projectile interact. In the following, the treatment of these thermodynamic properties in the context of in-efficient accretion and equilibration is described. For the case of no re-equilibration, P e , T e and f O2 cannot be defined as there is not a chemical reaction or an equilibrium between co-existing metal and silicate of known compositions.

Equilibration pressure and temperature
Following each event involving metal-silicate equilibration, the silicate-rich and the iron-rich reservoirs are assumed to equilibrate at a pressure P e which is a constant fraction f P of the embryos' evolving core-mantle boundary pressure P CMB , and the equilibration temperature T e is forced to lie midway between the peridotite liquidus and solidus at the equilibration pressure P e (e.g., . The pressure P e defined in Equation 20 is a simplified empirical parameter which averages the equilibration pressures for different types of impact events, and the constant f P is a proxy for the average depth of impactinduced magma oceans. Here, we adopt a value f P = 0.7 for all the accretion events, consistently with the findings by de Vries et al. (2016), which studied the pressure and temperature conditions of metal-silicate equilibration, after each impact, as Earth-like planets accrete. In case of full-requilibration following hit-and-run and erosive collisions, we assume f P = 1 (that is, P e = P CMB ) because the metal and silicate reservoirs are in a mixed state and re-equilibrate at a pressure equal to that of the core-mantle boundary.
For each of the resulting bodies, we determine the pressure P CMB by using Equation 2.73 in Turcotte & Schubert (2002). The radial position of the core-mantle boundary is computed by using the approximation that an embryo is a simple two-layer sphere of radius R consisting of a core of density ρ c and radius R c surrounded by a mantle of thickness (R − R c ): where the embryo's mean density ρ is provided by the density-mass relationship introduced in E20 normalized to predict the density of Earth (5510 kg/m 3 ) for a planetary mass of 1M ⊕ . Assuming a mantle density ρ m = 0.4 ρ c ( 2 ), the core density ρ c is equal to 2 The approximation that ρm = 0.4ρc follows from the ratio between the densities of uncompressed peridotite and iron: ρ peridotite /ρ iron = 3100 kg/m 3 /7874 kg/m 3 ∼ 0.4.
where Z is the embryo's core mass fraction as predicted by the surrogate models in Section 2.4. For an Earth-mass planet with core mass fraction of 30%, the assumption of two constant density layers as presented above provides P CMB = 130 GPa, which is just 4.4% lower than P CMB of the modern Earth (136 GPa, Turcotte & Schubert 2002). In the immediate aftermath of giant impacts, the internal pressure may be lower than that of the modern value due to the heat release in the impact and higher rotation rates of the planet; the pressure is nevertheless expected to increase to its steady state value due to subsequent de-spinning, heat loss into space and vapor deposition (Lock & Stewart 2019).

Oxygen fugacity
The oxygen fugacity f O2 determines the redox conditions for geologic chemical reactions. It is a measure of the effective availability of oxygen for redox reactions, and it dictates the oxidation states of cations like iron that have multiple possible valence states. For oxygenpoor compositions, oxygen fugacity is a strong function of the equilibration temperature because the concentration of Si in the metal strongly increases with temperature which increases the concentration of FeO in the silicate. For more oxidized compositions, both Si and O dissolve in the metal, and oxygen fugacity is a much weaker function of temperature than in the case of more reduced bulk compositions.
The major benefit of the mass balance approach to modeling metal-silicate equilibration as described in Section 3.2 is that the oxygen fugacity does not need to be assumed (as is done in most core formation models), but it is determined directly from the compositions of equilibrated metal and silicate (Equation 19). The oxygen fugacity is defined as the partition coefficient of iron between metal and silicate computed relative to the ironwüstite buffer (IW, the oxygen fugacity defined by the equilibrium 2Fe + O 2 = 2 FeO): where X represents the mole fractions of components in metal or silicate liquids. X Mw FeO is related to the silicate liquid composition (Rubie et al. 2011) and X met F e is the fraction of total iron in the bulk composition that is present as metal, as opposed to oxide (i.e., FeO in the silicate). In Equation 23, the activity coefficients are assumed to be unity because of the high temperatures, as is normally done when calculating f O2 in studies of core formation, and because their values are very poorly known at high pressures and temperatures. This assumption is discussed in detail by Gessmann et al. (1999).

INEFFICIENT ACCRETION VERSUS PERFECT MERGING: SINGLE IMPACT EVENTS
Here, we compare the predictions of the perfectmerging model and the inefficient-accretion model of Section 2 for the case study of a single giant impact between a target of mass M T = 0.1 M ⊕ and a projectile of mass M P = 0.7M T . For the two models, we analyze the core mass fraction (Section 4.1), accretion efficiencies of the cores and the mantles (Section 4.2), pressure, temperature, and oxygen fugacity of metal-silicate equilibration of the resulting bodies (Section 4.3). In the case of composition, we assume that the two embryos differentiated from early solar system materials that were chemically reduced. A high value of the fraction of total Fe in the system that is present in metal (X met Fe = 0.99) is justified in  as a condition to achieve reducing conditions so that elements such as Si and Cr partition sufficiently into the core. The refractory siderophile elements are assumed to be in solar-system relative proportions (i.e. CI chondritic, Palme & O'Neill 2003). This yields a core mass fraction which is approximately equal to 30% for both target and projectile, which is that of the colliding bodies in the SPH data of Section 2.1.

Core mass fraction
We show in Figure 3 that, in a collision between a target of mass M T = 0.1 M ⊕ and a projectile of mass M P = 0.7M T , the inefficient-accretion model predicts that the collision may result in two remnants and the resulting bodies' core mass fractions may be significantly different from those of their parent bodies depending on the characteristics of the collision (namely impact angle and velocity). In contrast, the perfect-merging model predicts that the outcome is a single, larger embryo of mass M T +M P whose core mass fraction is equal to that of the target, i.e., its bulk composition is unchanged.
Cases of erosive collisions and disruptive hit-and-run events result in a net increase in the core mass fraction of the largest remnant due to massive amounts of mantle loss, which is predicted by our inefficient-accretion model trained on these events. In hit-and-run collisions, the projectile's core mass fraction is also larger than the pre-impact value, exacerbating the erroneous approximations of the perfect merging model, as the latter does not produce two diverse objects as a result of a single (hit and run) collision.
In two regions of the parameter space (white areas in Figure 3), the core mass fraction of the collision remnants is predicted to be at most 4% and 3% less than the pre-impact values. This corresponds to a variation of ∼ −1 % in core mass fraction absolute value, which is within the noise floor of the surrogate models of core mass fraction (Figure 1).

Accretion efficiency
For any collision, geochemical modellers that use the perfect-merging assumption tend to approximate that the projectile's mantle and core accrete into the target's mantle and core, and undergo equilibration separately . For a collision between a target of mass M T = 0.1 M ⊕ and a projectile of mass M P = 0.7M T , the perfect-merging model predicts that the projectile's core plunges into the target mantle and that the entire projectile's mantle is accreted (ξ m L = ξ c L = 1), for every combination of impact angle and velocity. In contrast, our results show that there is a much larger diversity of core accretion efficiencies as a function of impact parameters, as shown in Figure 4, which demonstrates a critical inaccuracy of perfect merging and equilibration assumptions.
In the parameter space of impact angle and impact velocity, we identify five collision regimes according to the core and mantle's accretion efficiencies of the largest remnant. These are described in the bottom-left panel of Figure 4 and defined as: A: Core and mantle accretion occurs when ξ c L ≥ 0 and ξ m L ≥ 0. No second remnant is present because the projectile plunges into the target and gets accreted.
B: Core accretion with loss of mantle material occurs when ξ c L ≥ 0 and ξ m L < 0. No second remnant is present because the projectile's core plunges into the target and gets accreted. C: Core erosion occurs when ξ c L < 0. The target's mantle is catastrophically disrupted and core erosion may also occur. The largest remnant has a larger core mass fraction than the target ( The target's core does not gain or lose substantial mass, while the target's mantle may lose some mass depending on the impact velocity. The bulk projectile escapes accretion and becomes the second remnant. Substantial debris production may occur. E: Disruptive hit-and-run collisions occur in the rest of the parameter space. The target's loses part of its mantle and the largest remnant has a larger core mass fraction than the target (Figure 3). The H: Core erosion occurs when ξ c S <-0.1. The second remnant's core mass fraction is strongly enhanced with respect to that of the projectile. In disruptive hit-and-run collisions, the energy of the impact may be high enough to erode some core material. I: Projectile obliteration occurs when ξ S = ξ c S = ξ m S = −1. No second remnant exists, as the projectile is either accreted or completely disrupted.
In Figure 4, we also observe that the surrogate model can produce nonphysical/inconsistent predictions for certain combinations of parameters. For example, at high angle and low velocity the surrogate model predicts that the core is eroded with accretion efficiency ∼ −0.1 but mantle has an accretion efficiency near 0. This region is expected to be an artifact, since core erosion is expected to be accompanied by substantial mantle loss. This discrepancy, however, is within the noise floor of the surrogate model, which is estimated to be ∼ 0.15 in units of accretion efficiency after error propagation.

Pressure, temperature and oxygen fugacity
The equilibration conditions of planets resulting from inefficient accretion (due to the stripping of mantle and core materials) will differ from those produced by the perfect-merging model. In Figure 5 we show the difference in equilibration conditions for a single impact scenario (the same impact scenario as the previous sections); hit-and-run and erosive collisions are treated assuming full re-equilibration, for which the equilibration conditions are well-defined. The relative differences in the equilibration results are computed with respect to the perfect-merging values as δX = X inefficient accretion − X perfect merging X perfect merging × 100 %, where X indicates one of the thermodynamic metalsilicate equilibration parameters: pressure, temperature, or oxygen fugacity of metal-silicate equilibration for an embryo with initial X met Fe = 0.99. Positive values of δ[log f O2 ] indicate that the metal-silicate equilibration in the resulting body occurs at more chemically reduced conditions than in the planet from the perfectmerging model because of lower equilibration temperatures, while negative values of δ[log f O2 ] indicate less chemically reduced conditions. As a result of mass loss in the target, the inefficientaccretion predictions deviate substantially from the perfect-merging predictions in case of erosive collisions (regimes B, C, D in in Section 4.2) and disruptive hitand-run collisions (regime E). Conversely, when the collision is accretionary (regime A), the equilibration conditions predicted by the inefficient-accretion model are similar to those obtained with the perfect-merging model (δP e ≈ δT e ∼ 0 % and δ[log f O2 ] ∼ 0 %). Importantly, since the equilibration temperature is defined as a simple one-to-one function with respect to the equilibration pressure, as described in Section 3.2.1, the contours for temperature and pressure are very similar.

INEFFICIENT ACCRETION VERSUS PERFECT MERGING: N -BODY SIMULATIONS
In this Section, we investigate how planetary differentiation is affected by inefficient accretion during the endstage of terrestrial planet formation. In contrast to Section 4, in which we investigate the case study of a single giant impact, here we use the core-mantle differentiation model to interpret the results of the N -body simulations of accretion by E20 which model the evolution of hundreds of Moon-to-Mars-sized embryos as they orbit the Sun and collide to form the terrestrial planets. The goal of our analysis, however, is not to reproduce the solar system terrestrial planets, but to investigate whether or not the perfect-merging and inefficient-accretion models produce significantly different predictions for the terrestrial planets' final properties after a series of collisions under a fiducial N -body setup.

Initial mass and composition of the embryos
We use the data set of N -body runs performed in E20 and test the effects of the two collision models under a single equilibration model. The dataset consists of 16 simulations that use the more realistic treatment of collisions (inefficient-accretion model) and, in addition, 16 control simulations where collisions are taken to be fully accretionary (perfect-merging model). All the simulations were obtained with the mercury6 N -body code (Chambers 1999). For the inefficient-accretion model, E20 use the code library collresolve 3 . Each N -body simulation begins with 3 https://github.com/aemsenhuber/collresolve 153-158 planetary embryos moving in a disk with surface density similar to that for solids in a minimum mass solar nebula (Weidenschilling 1977). As in Chambers (2001), two initial mass distributions are examined: approximately uniform masses, and a bimodal distribution with a few large (i.e., Mars-sized) and many small (i.e., Moon-sized) bodies. The embryos in the simulations 01-04 all have the same initial mass (∼1.67 × 10 −2 M ⊕ ). In simulations 11-14, the embryos have their initial mass proportional to the local surface density of solids; the minimum mass is 1.59 × 10 −3 M ⊕ . In simulations 21-24, the initial mass distribution is bimodal and the two populations of embryos are characterized by bodies with the same mass; the minimum masses in the two populations are 1.79 × 10 −3 M ⊕ and 7.92 × 10 −2 M ⊕ . In simulations 31-34, the initial mass distribution is also bimodal but the bodies have mass proportional to the local surface density; the minimum embryo mass is 2.14 × 10 −3 M ⊕ .
The compositions of the initial embryos are set by the initial oxygen fugacity conditions of the early solar system materials which are defined as a function of heliocentric distance. Among the models of early solar system materials that have heritage in the literature, we adopt the model by , whose parameters were refined through least squares minimization to obtain an Earth-like planet with mantle composition close to that of the Bulk Silicate Earth. Accretion happens from two distinct reservoirs of planet-forming materials: one of reduced material in the inner solar system <0.95 au, with the fraction of iron dissolved in metal equal to 0.99 and fraction of available silicon dissolved in the metal equal to 0.20. Exterior to that, no dissolved silicon is present in the metal and the proportion of Fe in metal decreases linearly until a value of 0.11 is reached at 2.82 au. Beyond 2.82 au, the iron metal fraction linearly decreases and reaches about zero at 6.8 au (see Figure 6 in .
Within each group of N -body simulations from E20, which vary different aspects of the initial conditions, we compare results from the perfect merging and inefficient accretion scenarios. In sets 01-04 and 21-24, the initial mass of every embryo is identical, so the number density of embryos scales with the surface density. In sets 11-14 and 31-34, the initial embryos' masses scale with the local surface density; the spacing between embryos is independent of the surface density, but the heliocentric distance between them gets smaller as distance increases, hence the number density of the embryos increases with heliocentric distance. This means that the simulations 11-14 and 31-34 are initialized with most of the embryos forming farther from the Sun than those in simulations 01-04 and 21-24. Following the model by , this implies that most of the embryos in simulations 11-14 and 31-34 form with initial core mass fractions smaller than 30%.

Working assumptions
1. The N -body simulations of E20 are based on the assumption that the bodies are not spinning prior to each collision and are not spinning afterwards. We acknowledge that this approximation violates the conservation of angular momentum in offaxis collisions and that collisions between spinning bodies would alter accretion behavior (e.g., Agnor et al. 1999).
2. In the N -body simulations with inefficient accretion by E20, only the remnants whose mass is larger than 1 × 10 −3 M ⊕ are considered. Removing small bodies from the N -body simulations avoids uncertainties that may arise from querying predictions from the machine learning model in regimes on which is was not trained (the SPH dataset extends down to collisions with a total mass of 2 × 10 −3 M ⊕ ). If the surrogate model predicts a mass smaller than the threshold, then this body is unconditionally treated as debris and does not dynamically interact with the embryos. E20 nevertheless tracked the evolution of the overall debris mass budget.
3. As the embryos evolve during accretion through collisions, their core mass fractions can evolve to be different from 30%, which is the core mass fraction of the SPH colliding bodies that were used to train the surrogate models in Section 2.4. For this reason, in the analysis of the N -body simulations we make the approximation that the core mass fraction of each collision remnant prior to metalsilicate equilibration is equal to where Z * is the core mass fraction of the remnant as predicted by the surrogate model of Section 2.4, and Z 0 is the metal fraction of the parent body as computed by the core-mantle differentiation model. To guarantee the conservation of siderophile material, the prediction by Equation 25 is bounded to lie between 0 and 100%.

Results: Core mass fraction
To compare the effect of the two accretion model assumptions (inefficient accretion and perfect merging) for planets in the N -body simulations, we examine the final core mass fractions. We chose not to bin the data as we found the results to be highly sensitive to choices in bin width (low-N statistical issues). Figure 6 is a plot of the core mass fraction of the final planets as a function of planetary mass as determined by the perfect-merging model (open diamonds) and the inefficient-accretion model (dots). Each of the four subpanels of Figure 6 plots one of the four groups of N -body simulations (01-04, 11-14, 21-24, 31-34) described in Section 5.1. Each subpanel shows the results from all the four N -body simulations of the correspondent group. The inefficient-accretion results are colorcoded in terms of their h-number, which measures how many hit-and-run collisions an embryo experienced during the accretion simulation (Asphaug & Reufer 2014;E20). If the collision event is a merger, the largest remnant's h-number is equal to the mass average of the target's and projectile's h-numbers. If the event is a hitand-run collision, the second remnant's h-number is increased by 1, while the h-number of the largest remnant does not change. We also plot the estimated values for the core mass fractions of the inner solar system terrestrial bodies as red stars: Z Earth = 33 % , Z Venus = 31 %, Z Mars = 24 %, Z Mercury = 69 % (Sorokhtin et al. 2011;Helffrich 2017;Hauck et al. 2013, respectively).
Perfect merging simulations generally produce planets with core mass fractions close to 30% regardless of the initial embryo distribution. The most massive planets produced by inefficient accretion also have core mass fractions near 30%, but the degree of spread in core mass fraction increases substantially among the less massive bodies. Furthermore, remnants with core mass fractions above 40% are generally found to have relatively high hnumbers, meaning that they survived multiple hit-andrun collisions.
As discussed in E20, the initial mass of the embryos influences the dynamical environment and thus imparts a change in the degree of mixing between feeding zones in the planetary disk. This is found to affect the spread in core mass fraction of the smaller embryos. The simulations that produced the results in the top panels of Figure 6 were initialized with embryos of similar mass. As a result, these simulations are characterized by a predominance of collisions between similar-mass bodies which tend to result in more hit-and-run collisions (Asphaug 2010;Gabriel et al. 2020). By analyzing the evolution of the debris mass using Equations 14-17, we find that the debris is composed mainly of the mantle material of the embryos, with the core material contributing up to 15% when collisions are predominantly hit-and-runs in nature. This suggests that many of the less massive planets are the "runners" from hit-and-run collisions (blue and magenta colored points), which managed to escape accretion onto the larger bodies but lost mantle material in the process, an effect predicted in Asphaug & Reufer (2014). This also explains why remnants with high h-numbers tend to also have higher core mass fractions. Conversely, in N -body simulations that are initialized with embryos of dissimilar mass (bottom panels of Figure 6), the most massive bodies establish themselves dynamically early on, accreting smaller bodies in non-disruptive collisions. This leads to final bodies with relatively similar core mass fractions.
Small planets that have a low h-number (cyan colored dots) are either giant-impact-free embryos (i.e., small planets with core mass fraction ∼ 30%) or the largest remnants of disruptive collisions (i.e., small planets with core mass fraction > 30%). In reality, the latter would sweep up some of their debris (an effect not included in the N -body simulations by E20); this would lower the value of their core mass fraction, as mantle material is preferentially eroded in collisions. We also observe that the spread in core mass fraction in small bodies depends also on the heliocentric distance at which they form. Adopting the model for early solar system materials by , the embryos that form farther than 0.95 au have an initial core mass fraction lower than 30% due to oxidation of metallic iron by water. This explains the presence of a few small embryos with core mass fraction lower than 30%.
Finally, we find that the distribution of core mass fraction as function of mass is robust against the assumption of the degree of mixing and re-equilibration of the metal and silicate reservoirs in hit-and-run and erosive collisions. This is shown in Figure 7, left panel, which is a plot of the the core mass fraction values of the inefficientaccretion planets predicted using the assumption of full mixing and re-equilibration (vertical axis) versus the results assuming no re-equilibration (horizontal axis). The predictions are well-fit by the 1:1 line with coefficient of determination R 2 = 0.98. This means that the diversity in core mass fraction observed in Figure 6 is primarily set by the erosive effect of giant impacts, and that subsequent re-equilibration of mixed reservoirs (or lack thereof) do not affect the prediction of core mass fraction. In contrast, the chemical composition of the metal and silicate reservoirs still depend upon the adopted assumption. This is shown in Figure 7, right panel, which is a plot of the concentration of Si and O in the core of the inefficient-accretion planets predicted using the assumption of full mixing and re-equilibration (vertical axis) versus the results assuming no re-equilibration (horizontal axis). As opposed to the core mass fractions (left panel), the data points in the right panel of Figure 6. N -body results by the inefficient-accretion model (dots) and the perfect-merging model (diamonds) in terms of the final planets' core mass fraction as function of the final planetary mass and different initial mass distributions for the embryos. The core mass fractions of the inefficient-accretion planets are the arithmetic mean of the values obtained using the two assumptions of no re-equilibration and full re-equilibration of the metal and silicate reservoirs after hit-and-run and erosive collisions (Section 3.1). The red stars show the core mass fractions of the solar system terrestrial planets. Figure 7 are not well-fit by the 1:1 line. This warrants future studies on the nature and extent of mixing and reequilibration at the core-mantle boundaries, as further discussed in Section 6.3. 6. DISCUSSION

Statistics of planetary diversity
In order to quantify the observed diversity in smaller bodies, we compute the mean core mass fraction and the range in core mass fraction (i.e., maximum minus minimum values) of the two sub-populations of inefficientaccretion planets that have mass larger and smaller than 0.1M ⊕ ("more massive" and "less massive" planets, respectively, Table 2), and compare them to those of the perfect-merging planets. We use a cut-off value of 0.1M ⊕ because the perfect-merging simulations lack final planets smaller than 0.1M ⊕ (with the exception of 1 planet in the simulations 01-04), and because 0.1M ⊕ roughly corresponds to the maximum mass of the initial embryos of the N -body simulations from E20.
The statistics in Table 2 shows that the more massive inefficient-accretion planets tend to be fairly similar to one another in terms of core mass fraction, while there is a higher degree of diversity among the less massive planets. Futhermore, the more massive planets have core mass fraction statistics similar to that of the perfectmerging planets; this similarity is expected to be further enhanced if debris re-accretion is modelled, because debris tends to be preferentially accreted by the larger planets. Conversely, the average core mass fraction and its spread for the less massive planets tend to be higher than those of the perfect-merging planets because the former are preferentially eroded by giant impacts (Asphaug 2010). This is especially true when collisions are predominantly hit-and-runs in nature (e.g., results in the top panels of Figure 6).

Effect of debris re-accretion
In the N -body simulations of E20, debris produced in the aftermath of a giant impact that was not bound to the target or the runner was simply removed from the system. The mass of debris produced in a given giant impact is usually a relatively minor fraction of the colliding mass, compared to the major bodies, but cumulatively this simplification leads to significant mass removal from the system: up to 80% of the mass in the initial embryos (E20). Here we consider how this may affect our predictions of compositional diversity with size.
Other studies of terrestrial planet formation have treated giant impact debris explicitly (e.g., Stewart & Leinhardt 2009;Kokubo & Genda 2010;Chambers 2013;Quintana et al. 2016), although there is substantial difference from study to study in the treatment of debris production, the manner of its release into orbit, and the manner in which debris interacts gravitationally with itself and other bodies. In addition, there are differences in the calculation of the accretion efficiency of gi-ant impacts. Chambers (2013), whose initial conditions were replicated in E20, remains the most useful point of comparison, as it provides end-member cases of perfect merging, as well as simulations where debris is produced according to a scaling law (Leinhardt & Stewart 2012). The orbital release of debris in the simulations from Chambers (2013) is represented as a few particles radiating isotropically from the collision point at 1.05 times the target escape velocity, although it is unclear whether the debris fragments subsequently interact with each other or only with the major bodies.
With these approximations, Chambers (2013) found that planets took longer to reach their final masses, due to the sweep up of simulated debris particles, than in the cases with perfect merging. The mean time required for Earth analogues to reach their final mass is ∼159 Myr, substantially longer than 101 Myr under perfect merging. The simulations in E20 neglect debris but include realistic models of inefficient accretion, which results in a protracted tail of final accretion events lasting to ∼200 Myr. The longer timescale in this case is due to the realistic inclusion of hit-and-run collisions and the more accurate (stronger) orbital deflections for the runner in those simulations. Hit-and-run impactors in E20 appear to have served a similar role as the debris particles in Chambers (2013).
Of greater importance for the present study are the masses and compositions of the finished planets. Compared to perfect merging, Chambers (2013) found that more terrestrial planets (4-5) are produced, with a wider distribution of resulting masses, when collisional fragmentation is approximated as described. Simulations in E20 had a median number of 3 and 4 final planets under the perfect merging and realistic-accretion assumptions, respectively. The standard deviation in the final number of planets in E20 is 0.7 for the perfect-merging simulations, and 2 when applying the inefficient-accretion model.
The evolution and influence of debris can also have important dynamical effects on the accreting planets. Similar to the effects noted in O' Brien et al. (2006), the simulated debris particles in Chambers (2013) appear to have applied a dynamical friction to the orbits of the major bodies, leading to slightly lower eccentricities than those of the perfect-merging simulations. In light of this, neglecting debris in E20 is likely to have allowed a greater dynamical excitation of the growing planets, which subsequently would have collided at higher impact velocities. Half of the collisions in E20 had V coll > 1.6 V esc , and a quarter had V coll > 2.12 V esc , whereas 95% of the collisions in Chambers (2013) had velocities lower than 1.6v esc 4 . Further comparison of dynamical excitation and planet formation is difficult, because Chambers (2013) applied a hard cut-off to discern between accretion/disruption with debris production and hit-and-run collisions. In this simplified approach, regardless of impact velocity, a giant impact at impact angle greater than 30°is treated as a hit and run, and there is no gravitational deflection of the runner. In E20, by contrast, the close approaches and resulting angular momentum distribution between the colliding bodies are explicitly resolved. At relatively high velocity, most of the collisions at low impact angle predicted to be accretionary by Chambers (2013) are actually hit-and-run, as discussed in C19; conversely, at low velocity, many of the collisions predicted by Chambers (2013) to be hit-and-run are actually accretions.
Of paramount interest to the present work is how the approach of ignoring the debris influences our predictions for planetary diversity. Debris that are produced in a giant impact are on heliocentric orbits that can intersect the target, and can intersect the runner in the case of hit-and-run collisions. As argued by Asphaug & Reufer (2014), accretion occurs preferentially onto the most massive body, the post-impact target, which has the substantially larger collisional cross-section and sweeps up most of the debris. Re-impacts with the runner are less frequent, and furthermore are more likely to be erosive. Therefore, the inclusion of a proper treatment of debris-embryo interaction is expected to strengthen our conclusion, that planetary diversity increases at smaller scales.

Future work
In future studies, we will develop a recipe for the degree of re-equilibration, in-between the two end members studied here, as a function of impact conditions for giant impacts, building upon what it has been already achieved in previous studies (e.g., Dahl & Stevenson 2010;Nakajima & Stevenson 2016;Raskin & Morello 2019). This includes: (1) studying at which depth in the planet equilibration occurs, i.e., determining the value of f P (Equation 20) for different collision regimes; and (2) a more-advanced prediction for the equilibration temperature. In the planetary differentiation model, the latter is set at the midpoint between the solidus and liquidus temperatures of mantle peridotite. In this way, we have enforced that the temperature behaves like the pressure of metal-silicate equilibration. This assumption is probably accurate for the case of a projectile's core sinking through a mantle magma ocean, but proving its validity for hit-and-run or erosive collisions requires further study.
Future work will also aim to further improve the realism of the surrogate model of giant impacts. Specifically, we plan to train neural networks on SPH simulations of collisions between embryos with core mass fraction different from 30% and refine the treatment of the debris field in N -body simulations with respect to that used in E20. As discussed in Section 6.2, in addition to increase the mass of the surviving embryos, debris are expected to reduce the eccentricities and inclinations of the embryos via dynamical friction (Raymond et al. 2006;O'Brien et al. 2006;Kobayashi et al. 2019), thus increasing the accretion efficiency but reducing the interactions across feeding zones.

CONCLUSIONS
In this work, neural networks trained on giant impact simulations have been implemented in a core-mantle dif-ferentiation model coupled with N -body orbital dynamical evolution simulations to study the effect of inefficient accretion on planetary differentiation and evolution. We make a comparison between the results of the neural-network model ("inefficient accretion") and those obtained by treating all collisions unconditionally as mergers with no production of debris ("perfect merging").
For a single collision scenario between two planetary embryos, we find that the assumption of perfect merging overestimates the resulting bodies' mass and thus their equilibration pressure and temperature. Assuming that the colliding bodies have oxygen-poor bulk compositions, the inefficient-accretion model produces a wider range of oxidation states that depends intimately on the impact velocity and angle; mass loss due to inefficient accretion leads to more reduced oxygen fugacities of metalsilicate equilibration because of the strong temperature dependence at low oxygen fugacities.
To investigate the cumulative effect of giant impacts on planetary differentiation, we use a core-mantle differentiation model to post-process the results of N -body simulations obtained in E20, where terrestrial planet formation was modeled with both perfect merging and inefficient accretion. The inefficient-accretion model suggests that planets less massive than 0.1 M ⊕ are compositionally diverse in terms of core mass fraction. This is driven by the effect of mantle erosion that is included in the machine-learning model. In contrast, both models provide similar predictions for planets more massive than ≈0.1 M ⊕ , e.g., a tight clustering near 30-40% value of core mass fraction. This is consistent with previous studies that successfully reproduced Earth's Bulk Silicate composition using the results from N -body simulations with perfect merging (Rubie et al. , 2016. We therefore suggest that an inefficient-accretion model is necessary to accurately track compositional evolution in terrestrial planet formation, particularly when it comes to modeling the history and composition of less massive bodies. Our results improve upon previous studies that reached similar conclusions but did not find planets with core mass fraction comparable to that of Mercury (e.g., Dwyer et al. 2015) or whose simulations (specifically the simplification of close approaches) likely underestimated mantle erosion (e.g., Carter et al. 2015).
Finally, the value of oxygen fugacity of metal-silicate equilibration is known to influence the post-accretion evolution of rocky planets' atmospheres (e.g., Zahnle et al. 2007;Armstrong et al. 2019;Zahnle et al. 2020). Improving the realism of planet formation models with realistic collision models therefore becomes crucial not only for understanding how terrestrial embryos accrete, but also to make testable predictions of how some of them may evolve from magma-ocean planets to potentially habitable worlds.