Particle clustering in turbulence: Prediction of spatial and statistical properties with deep learning

We investigate the utility of deep learning for modeling the clustering of particles that are aerodynamically coupled to turbulent fluids. Using a Lagrangian particle module within the Athena++ hydrodynamics code, we simulate the dynamics of particles in the Epstein drag regime within a periodic domain of isotropic forced hydrodynamic turbulence. This setup is an idealized model relevant to the collisional growth of micron to mm-sized dust particles in early stage planet formation. The simulation data are used to train a U-Net deep learning model to predict gridded three-dimensional representations of the particle density and velocity fields, given as input the corresponding fluid fields. The trained model qualitatively captures the filamentary structure of clustered particles in a highly non-linear regime. We assess model fidelity by calculating metrics of the density field (the radial distribution function) and of the velocity field (the relative velocity and the relative radial velocity between particles). Although trained only on the spatial fields, the model predicts these statistical quantities with errors that are typically<10%. Our results suggest that, given appropriately expanded training data, deep learning could complement direct numerical simulations in predicting particle clustering within turbulent flows.


INTRODUCTION
The dynamics of aerodynamically coupled particles in turbulent fluids are of interest for astrophysical, atmospheric physics, experimental, and industrial applications (Toschi & Bodenschatz 2009).The coupling between the fluid and particle phases modifies the turbulent cascade, and leads to clustering, effective diffusion, and collisions among the particles (Saffman & Turner 1956).The generic problem has numerous physically significant variations, depending upon the regime of aerodynamic drag (Stokes or Epstein), the compressibility of the turbulence, the relative mass in particles versus that in the fluid, and the amount of fluid volume excluded by particles' finite size.
In the astrophysical domain, the response of solid particles to turbulence is important in protoplanetary disk physics and early phase planet formation.On large spatial scales, the ubiquitous substructure seen in dust continuum images of protoplanetary disks (van der Marel et al. 2013;ALMA Partnership et al. 2015;Andrews et al. 2018;Andrews 2020) is interpreted as concentrations of solids within gaseous vortices (Barge & Sommeria 1995;Bracco et al. 1999) and axisymmetric pressure maxima (Whipple 1972), structures which may themselves have formed as a consequence of disk turbulence (Johansen et al. 2009;Manger & Klahr 2018).On small scales, the collision velocity of small (micron to cmsized) particles is typically determined by how the particles interact with turbulence in the gas (Weidenschilling & Cuzzi 1993;Johansen et al. 2014).For small particles the collision velocity induced by turbulence increases with size.Because dust particles fragment (or bounce) above some threshold velocity (Blum & Wurm 2008), this behavior leads to the prediction of a maximum particle size that can result from coagulation.On intermediate scales, the two-way momentum transfer between particles and gas in disks leads to the streaming instability (Youdin & Goodman 2005;Krapp et al. 2019;Paardekooper et al. 2020;Yang & Zhu 2021), that may be responsible for the formation of planetesimals via gravitational collapse (Johansen et al. 2015;Simon et al. 2016;Li & Youdin 2021;Yang et al. 2017;Schäfer et al. 2017).
The various manifestations of particle-gas interactions in protoplanetary disks can be studied using analytic methods (e.g.Voelk et al. 1980;Ormel & Cuzzi 2007;Pan & Padoan 2010;Krapp et al. 2019;Lin 2021), and by simulations that treat the particle phase as either discrete Lagrangian particles (e.g.Yang & Johansen (2016)) or as a second fluid (e.g.Bracco et al. 1999;Pan et al. 2011;Ishihara et al. 2018;Bhatnagar et al. 2018;Krapp et al. 2020;Sakurai et al. 2021;Huang & Bai 2022).Although powerful, these methods do not suffice to answer all of the physically interesting questions that arise in planet formation studies.Numerical simulations can only model a small subset of the full range of spatial scales, leaving open questions as to how interactions on different scales combine (Chambers 2010;Hartlep & Cuzzi 2020).Even when only a limited range of scales need to be studied, it is computationally expensive to run models that span the range of plausible size distributions for particles in protoplanetary disks.Machine Learning (ML) approaches may provide new theoretical tools to study these problems.In analogous cosmology applications, ML has been used to generate realistic predictions of gas properties from dark matter only simulations, and to interpolate reliably from grids of simulations that sparsely cover the full parameter space (Villaescusa-Navarro et al. 2021).ML architectures have also been combined with traditional turbulence modeling techniques, such as Reynolds-averaged Navier-Stokes (RANS) models and large eddies simulation (LES) models, to improve models of the turbulence properties and turbulence-particle interactions predicted by coarsescale, non-ML models (Ling et al. 2016;Xie et al. 2019;Wang et al. 2017;Wu et al. 2022;Davydzenka & Tahmasebi 2022).These ML methods are able to recover the statistical properties of turbulence and particle clustering over a reasonable range in parameter space of interest.In planet formation it might be possible to extend the range of explicit simulations with an effective ML sub-grid model, and to make predictions for arbitrary particle size distributions using a fairly small set of input simulations.
In this paper we present an initial investigation into how well established ML methods work for predicting the spatial and statistical properties of particles in turbulent fluids.We consider the response of a single particle size to isotropic forced turbulence, in a periodic domain, with aerodynamic drag in the Epstein regime.This setup excludes the forces that lead radial and vertical epicyclic oscillations of free particles, which are physically important (Youdin & Lithwick 2007).It is, however, the simplest model problem that is relevant to studies of the rate and velocity of particle collisions on small scales in protoplanetary disks.Clustering in this regime is well-studied, and is understood to result from the interaction of particles with small-scale fluid vortices (e.g.Johansen et al. 2014).This physical understanding does not, however, lead to a predictive model for the particle distribution.The collision velocity, on the other hand, is amenable to an analytic model (several exist, the most common in disk applications being Ormel & Cuzzi 2007).We generate training data using a recently developed Lagrangian particle module within the Athena++ code (Yang et. al., in prep.), and use a U-Net network architecture (Ronneberger et al. 2015) to learn the mapping between the three-dimensional fluid fields and gridded representations of the particle data.We then assess the fidelity of the predictions for both statistical properties of the particles (the strength of clustering and the relative velocity as a function of scale), and for their instantaneous spatial distribution.
The plan of this paper is as follows.In §2 we describe the Athena++ code, the implementation of Lagrangian particles within the code, and the setup of the forced turbulence simulations.The Machine Learning methods are described in §3.Sections §4 and §5 quantify the network performance for the particle density and velocity fields, and for statistical properties that would be relevant to the calculation of the rate and mutual velocity of particle collisions.We present some generalization tests in §6 and conclude in §7.

Physics of turbulent clustering
To aid in the subsequent discussion of clustering statistics, we briefly introduce some relevant physics of turbulence and turbulence-driven dust dynamics.
In the standard description of fluid turbulence (Kolmogorov 1941(Kolmogorov , 1991)), the statistics of turbulence within the inertial range depend only on the mean energy transfer rate ⟨ϵ⟩.In this limit, the two point relative velocity ⟨δ⃗ u • δ⃗ u⟩ at separation ℓ scales as ⟨ϵ⟩ 2/3 ℓ 2/3 .In the inertial range, dissipation due to viscosity is negligible and energy cascades to smaller scales.The scale at which viscosity becomes important can be found by equating ⟨ϵ⟩ to the energy dissipation rate ν 2 ≈ ⟨ϵ⟩ (Ching 2014), from which one obtains the Kolmogorov dissipative length scale, timescale and velocity scale as ℓ η = (ν 3 /⟨ϵ⟩) 1/4 , τ η = (ν/⟨ϵ⟩) 1/2 , and u η = (νϵ) 1/4 respectively.Now consider dust particles coupled to such a turbulent flow.If friction is the dominant force driving the dust, the equation of motion of a dust particle takes the form, where ⃗ v p is the velocity of particles, ⃗ v g is the velocity of gas, and τ s is the friction time characterizing the strength of coupling between the dust and gas.For dust particles in the Epstein regime 1 , that is, with size a p much smaller than the mean free path λ g of the gas molecules (a p ≪ λ g ), the friction time is given by (Weidenschilling & Cuzzi 1993), where ρ d is the material density of the dust, ρ g is the fluid density, and c s is the local speed of sound of the gas.
The clustering behavior depends on how τ s compares with a characteristic time-scale of the turbulent flow.A commonly used time scale relevant for our purposes is the Kolmogorov time-scale τ η (Sundaram & Collins 1997;Pan et al. 2011;Ishihara et al. 2018).It represents the turnover time of the smallest eddies.The ratio of these time-scales is the dimensionless Stokes number St = τ s /τ η 2 .Particles with St ≪ 1 are well coupled to the gas at all scales and behave like trace particles.With very little memory of its previous motion, the clustering will be mainly determined by the velocity divergence of the surrounding flow at one point in time.It has been shown that the particle velocity divergence i ∂ i v i p ≈ St/τ η (Pan et al. 2011), where i is the spatial index.This implies that for St < 1 the clustering intensifies with Stokes number.
Conversely, when St > 1, turbulent fluctuations are too rapid for the particles to respond.This has two effects.First, as the friction time τ s increases, the history of the particle has an increasingly large influence on the particle relative velocity.This can be seen in the formal solution of equation (1) (Pan & Padoan 2013), where X(t) is the position of the particle at some time t and t 0 is a reference time chosen to be long enough (i.e.t − t 0 ≫ τ s ) so that the velocity at time t 0 has been "forgotten" by the particle.This solution implies that the particle's current velocity depends on the past history of the gas velocity within a time window of ≈ τ s .Hence, one should expect that for St > 1 the local gas velocity divergence, and also the St/τ η scaling discussed above, should no longer be adequate for describing particle clustering.Furthermore, particles with St > 1 respond to eddies of different turnover time selectively, giving rise to a characteristic clustering scale ℓ τs (Pan et al. 2011).Since τ s > τ η , 1 In this regime the drag force is linear in the velocity difference between the particle and the fluid, rather than quadratic as in the Stokes regime typically relevant to terrestrial fluids 2 In protoplanetary disk studies, the "Stokes number" is often defined instead using as a characteristic time-scale the dynamical time, Ω −1 K , where Ω K is the Keplerian orbital frequency.Because our simulations do not include shear, we use the definition appropriate for fluid turbulence.some of the eddies have turnover time smaller than τ s .Particles at these scale are decoupled from the flow and experience random kicks due to the flow.This inhibits clustering at those scales.On the other hand, particles with turnover time much longer than τ s are well coupled to the eddies.This depresses clustering at large scale.Hence, the most significant clustering is expected to occur when τ eddy ≈ τ s .In the inertial range, τ eddy ≈ ℓ 2/3 ⟨ϵ⟩ −1/3 , where ℓ is the characteristic size of the eddies and ⟨ϵ⟩ is the mean energy injection rate.Equating τ eddy with τ s allow us to obtain the characteristic scale of the clustering ℓ τs ≈ ⟨ϵ⟩ 1/2 τ 3/2 s = St 3/2 ℓ η , where ℓ η is the Kolmogorov length.

Forced turbulence simulations
We use the ATHENA++ magnetohydrodynamics (MHD) code (Stone et al. 2020) to calculate the training data for our study.The fluid system solved represents the equations of compressible, inviscid hydrodynamics with an isothermal equation of state, Here, P * is a diagonal tensor with components P gas , and c s is the constant speed of sound.In common with Pan et al. (2011), who also used a Godunov-type numerical scheme, the simulations do not include any explicit viscosity (either a Navier-Stokes viscosity or a hyperviscosity).Dissipation of kinetic energy occurs via numerical diffusion at the grid scale.This choice maximizes the extent of the inertial range in the simulated turbulence -important given the moderate resolution we are using -but means that the effective viscosity and corresponding Reynolds number can only be estimated using established scaling relations for turbulence (Pan et al. 2011).All simulations are conducted using a uniform 3D Cartesian grid for the gas variables with a resolution of 256 cells in each spatial direction.We use the Harten-Laxvan Leer-Einfeld (HLLE) Riemann solver combined with a second-order accurate van Leer predictor-corrector scheme for time integration.Spatial reconstruction is done using the 3rd order piecewise parabolic method (PPM).We use periodic boundary conditions in all directions.
The gas is initialized with constant density ρ gas = 1 and with all velocity components v i = 0.The speed of sound c s = 1.To generate a turbulent velocity spectrum for the gas, we use the FFT turbulence driver module of ATHENA++, which continually feeds energy into the domain at the largest scales by adding large scale velocity perturbations to the flow.The scale of these perturbations is controlled by the parameters k min and k max , denoting the smallest and largest wavenumber to be perturbed.(Here wavenumber is defined as k = 1/λ, without a factor of 2π).Unless otherwise specified, we use k min = 1 and k max = 5 with simulation box size L x = L y = L z = 1, which allows the turbulent energy spectrum to develop naturally into a Kolmogorov-like spectrum, with numerical dissipation occurring near the Nyquist frequency.We note that, due to the limited numerical resolution of the training simulations, the obtained energy spectrum of the turbulence is not a standard Kolmogorov type spectrum.We exclude data from an initial transient state from our ML dataset.The amount of energy injected per perturbation interval (which we set to be equal to one simulation time-step) is given by the energy injection rate ϵ inj and is set to ϵ inj = 0.005 (except where specified).The RMS gas velocity calculated from the simulation is v rms g ≈ 0.104, which is ≈ 10% the speed of sound.The turbulence, although computed with a compressible code, is thus, for practical purposes, almost incompressible.As we are investigating subsonic turbulence, we enforce solenoidal turbulence driving by suppressing the compressive component of the velocity perturbation.

Lagrangian dust particles
We use the Lagrangian dust particle module of the ATHENA++ code (Yang et. al., in prep.) to simulate the particle dynamics.In all our simulations, 256 3 dust particles are used.For simplicity, we only treat the drag force exerted on the dust particles by the gas and neglect the back-reaction of the particles on the gas, which is a good approximation in cases with low dust-to-gas mass ratios.To interpolate between the gridded values of the gas component and the particle positions, the module uses the triangular shaped cloud (TSC) algorithm.Unless otherwise specified, we use a particle friction time of τ s = 0.1.
From the simulated output, we estimate the turbulent parameters of the fiducial test case to be: ⟨ϵ⟩ ≈ 3.17 × 10 −3 , ν ≈ 1.95 × 10 −5 , ℓ η ≈ 1.24 × 10 −3 , τ η ≈ 7.84 × 10 −2 and u η ≈ 1.57 × 10 −2 , all in the relevant simulation units.The macroscopic Reynolds number of the fluid simulation is computed to be Re ≈ 3400.The particle clustering length is estimated to be ℓ τs ≈ 1.58 × 10 −3 .The energy injection rate ⟨ϵ⟩ and eddy turnover time τ η are estimated from the third-order structure function and strain-rate tensor respectively, the particle friction time is taken directly from simulation setup, and the rest are obtained from their relationships with these three variables.

Dataset
Snapshots of the simulation described above are used as the training set of the model.The temporal separation between the frames is chosen to be long enough so that adja-cent frames are uncorrelated.The gas density field ρ g , gas velocity field ⃗ v g , particle density field ρ p , and particle velocity field ⃗ v p are used to train and validate the network.All fields are gridded on a 256 3 mesh.(The gas fields are natively gridded; the particle data is mapped to the same grid).The first two serve as the model's inputs, and the latter as outputs.The particles are gridded because our network architecture and data augmentation techniques (See §3.3.1, §3.3.2 for details) work on gridded inputs and outputs only.

Data preprocessing and Normalization
Data preprocessing and normalization steps are applied to the input and output fields to aid the training.

Input fields
The gas density ρ g is log normalized, where ρ * g is the normalized gas density and ρ g is the gas density.The log normalized density field helps separate gas density of various scales, allowing the network to better perceive the input field.No preprocessing is applied to the gas velocity v g .In preliminary experiments, we considered preprocessing the velocity field to show the network the gas vorticity, based on the physical argument that particle clustering depends on how the particles respond to vortices of varying scale.Somewhat surprisingly, we found that a network trained using gas vorticity ∇ × ⃗ v g as input produces visually indistinguishable outputs from one trained directly on gas velocity.While a qualitative study of the relevant statistics has not been performed on these models, we believe the visual comparison shows that the network is powerful enough to itself extract the physically relevant quantities from the velocity field.We will therefore not consider networks trained with vorticity (or velocity divergence) inputs in the remainder of the paper.

Output fields
The particle density field ρ p and velocity field v p are nongaussian (e.g.Ishihara et al. 2018).Therefore, data normalization and preprocessing techniques are employed to aid the training of the neural network.
For the particle density field ρ p we apply the normalization, Here, ρ * p is the normalized particle density and ρ 0 = 0.14, which corresponds roughly to the average particle density, is empirically chosen to make the distribution of normalized density ρ * p most symmetric.The mapping between ρ * p and ρ 0 is one-to-one, allowing us to invert ρ * p using a softplus-like function, For the particle velocity field ⃗ v p , the following normalization is applied, where ⃗ v g is the gas velocity, ⃗ v p is the particle velocity, and ⃗ v rel is the relative velocity.The motivation for this is to try to prevent the bulk gas velocity ⃗ v g from masking the fine details of the particle velocity ⃗ v p by training the network (referred to later as UNET-R) on the relative velocity ⃗ v rel .Intuitively, this helps the network focus on the physically interesting differences between the particle and gas velocities, instead of learning to copy the background flow.To evaluate the effectiveness of this strategy, we also trained a network directly on ⃗ v p (later, UNET-V) and compared it against that trained on ⃗ v rel .Details of the comparison are presented in §5.

Model
The network architecture U-Net, first developed by Ronneberger et al. (2015) for biomedical image segmentation and later generalized by Milletari et al. (2016) to handle volumetric data, is used.U-Net is a "fully convolutional network" that is capable of taking a multichannel, 3D input block and producing an output block of similar spatial size but with a different number of channels.It exhibits translational equivariance -that is, its predictions depend only on the relative positions of the grid points (Long et al. 2015).This is desirable because it is consistent with the physical picture of homogeneous turbulence (Ching 2014) and because it allows the network to be applied to input fields of different spatial sizes.The U-Net has been successfully employed in cosmology for tasks including super-resolution (generating high-resolution predictions from low-resolution simulations) (Schaurecker et al. 2021), and prediction of galaxy distributions from the dark matter density field (Yip et al. 2019).
The network is implemented using the map2map neural network emulator with the architecture specified in Fig. 1.The network consists of two major parts, the "compression part" and the "decompression part".The former consists of successive convolutional and downsampling layers that compress information of the input field into a smaller size latent representation, while the latter consists of successive levels of convolutional and up-sampling layers that expand the latent information into the output block.Information extracted in different levels of the network is incorporated by concatenating outputs of the compression part and those of the expanding part on the same level.

Training
The simulation frames, 252 in total, are randomly divided into training and validation sets in a 4:1 ratio.Training is done using only the frames in the training set.The entire training set is iterated through during each epoch of training.
During training, for each frame in the train set, a 1283 block is cropped from the input fields at 8 floating anchors 3 .Each 128 3 block is padded 20 grids in all directions to produce input fields of dimension (4,168,168,168), with three of the input channels coming from the spatial components of the gas velocity v (x/y/z) g and one from the gas density ρ g .During both cropping and padding, periodic boundary conditions are applied when the cropping or padding array index exceeds the valid range.Discrete rotation and parity transforms are also applied to the input block using map2map.
The data augmentation process serves several purposes.Practically, the cropping allows the input to fit onto one GPU, which enhances parallelization.Physically, cropping the data about floating anchors and applying rotation and parity flips helps the network capture the turbulent flow's transitionally symmetric and isotropic nature.Finally, the padding helps the network to preserve complete translational symmetry across the boundary and removes boundary artifacts in the output fields (Schaurecker et al. 2021).
The input enters the U-Net and produces an output field of size (4, 128, 128, 128): One of the channels represents particle density ρ p , and the rest represent particle relative velocity ⃗ v rel .Finally, the whole (4, 256, 256, 256) field is reconstructed from the output patches produced by map2map.
For the loss function, we choose to minimize the mean square error (MSE) loss over all output channels.Weight decay is also used.The loss function, which is minimized using the Adam optimizer (Kingma & Ba 2014), takes the following form, where y tgt , y out represents the target and output fields, B is the batch size, λ is the weight decay, and ∥w∥ is the square norm of weights.Unless otherwise specified, all training runs in this paper use learning rate 10 −4 , weight decay λ = 10 −4 , and batch size B = 1, for around 400 epochs until the results are stable.
After training, we evaluate the model by comparing the output with the target over a variety of statistics.The results are summarized in §4 and 5. frame basis.Snapshots of selected frames are presented in Fig. 2. Because the densities span a wide range, a logarithmic color map is chosen to aid visualization.We also show the normalized cross correlations 4 of these frames, defined as in Fig. 3.As shown in Fig. 2, both machine learning outputs (UNET-V, UNET-R) produce density distributions that are similar to the ground truth (ATHENA++).Locations with voids in the density distribution align, and the filamentary structures of the dust are captured.However, both predicted outputs are less sharp than the target field, indicating that the algorithm has smoothed out the densities.The density at the locations of peaks is also less pronounced, indicating an underprediction in those areas.These observations are borne out in Fig. 3, where we see that the peak values of normalized cross correlations for both UNET-V and UNET-R are only ≈ 0.6.The larger peak values for UNET-R show that UNET- 4 The normalized cross correlation of two function f, g is defined to be which is a normalized inner product of f, g.For volumetric data, the result is radially binned.
R, trained on the relative velocity, is slightly better at predicting ρ p than UNET-V.

Radial Distribution Function (RDF)
We compute the particle radial distribution function (RDF) g(r) of the simulated and U-Net-produced outputs to study the difference in their spatial distributions more carefully.Let n be the average dust density.The RDF is defined so that the average number of dust particles P (r) inside a volume element dV at a distance r relative to a reference particle is given by, Although this definition can be applied directly to compute the RDF, a faster way of obtaining it on a grid is available.First, we compute the particle density correlation function (DCF) ξ(r), which is defined as, Then we use the relation g(r) = 1 + ξ(r) between the two quantities to obtain the RDF (Shaw 2003).This allows for an efficient computation, as Eq. 13 can be computed quickly using the fast fourier transform (FFT).Unless otherwise specified, all RDFs computed in this paper are based on this method.The use of grid-based metrics, rather than metrics computed directly from the particle distribution, differs from most prior work.Differences in results that arise between the method based on Eq. 12, and that based on Eq. 13 for the simulated particle density, are discussed in Appendix A.
In the left-most panel of Fig. 4, we show the RDFs of ρ p produced by the simulation (ATHENA++), and by the U-Nets (UNET-V, UNET-R).Recall, UNET-V is the U-Net trained directly on v p , while UNET-R is trained on v rel .The features of the RDFs reflect what we see in the slice plots.The simulation and U-Nets agree well for r ≳ 0.01, indicating that the large scale features are captured.For r ≈ 7 × 10 −3 , a dip of roughly 5% is observed, indicating the underestimation of ρ p at small scales.The over-prediction for r ≳ 7 × 10 −3 might be due to the smoothed density around the peaks.Overall, the error in RDF statistics is bounded by ±5%.Although we have picked only one frame for discussion, the features observed are general.Discussion on the consistency across frames can be found in Appendix B.

Visual Comparison
In Fig. 5, we display slices of the three spatial components of the relative velocity field v rel produced by the simulation and by the U-Nets.Recall that the particle velocity v p can be thought of as having two contributions: (i) a gas velocity term v g , which represents the velocity of the underlying flow driving the dust, and (ii) a relative velocity term v rel = v p − v g , which measures the deviation of the particle velocity from the gas velocity due to the imperfect coupling between them.To prevent v g from overshadowing v rel in visual comparisons, we show v rel .
From Fig. 5, we see that both UNET-V and UNET-R produce v rel fields that are qualitatively similar to that of the ATHENA++ ground truth.It is difficult to tell which is better solely by visual comparison.For example, at around (v 3 rel , 0.5, 0.6), UNET-V produce a sharper output for places where that component peaks, whereas at around (v 3 rel , 0.1 − 0.3, 0.6 − 0.9), UNET-R captures the zeros of the field better.This motivates us to use more sophisticated statistics in §5.2 and §5.3, to assess which of them performs better.

Relative velocity ⟨w
The relative particle velocity ⟨w 2 ⟩ 1/2 is a commonly studied statistic of particle clustering in turbulence.Following  ) ATHENA++ shows the RDF computed directly from the simulation, UNET-R is the U-Net output trained with relative velocity between gas and dust, and UNET-V is the U-Net output trained directly on dust velocity.The first row compares the RDF of the simulation output and the U-Net prediction.The second row shows the ratio between the outputs and the target to better quantify the error.The zoom-in plot in the lower-right panel shows the peak of the error at small scales.Ishihara et al. (2018), we define it as, where r is the separation between the particle pairs, ⃗ V 1 , ⃗ V 2 are the velocities of the particles located at ⃗ r 1 , ⃗ r 2 and N p is the number of pairs of particles.
To compute Eq. 14 on the grid, we convert the average into a weighted one using the relevant particle density ρ p as weight.More precisely, let p be a grid cell and C p (r 1 , r 2 ) be the set of all grid cells whose center point lies inside the spherical shell with inner and outer radius r 1 and r 2 that is centered at p. Then a grid average of some quantity Q, weighted using particle density ρ p , is defined as, where ρ p (q) and Q(q) are understood as the grid values at cell q.Using this definition, we write for the ground truth relative velocity and for the output velocity field.
Unlike grid RDFs, computing these quantities on the grid is relatively expensive.To compute the statistic for the whole grid r ∈ [256 −1 , 1], we would need to check every grid point against every other grid point.If N is the number of grid points along one axis, this would imply a complexity ≈ (N 3 ) 2 = N 6 .To reduce the computational cost, we restrict r ∈ [256 −1 , 10 −1 ] for all relative velocity statistics.This is not a serious restriction because for many applications it is the accuracy of relative velocities at small scales that is the most important.Furthermore, from the estimated clustering scale in §2.3, we see that the upper bound of the radii r = 10 −1 ≫ ℓ η ∼ 10 −3 , so the relative velocity curve for r > 10 −1 is likely not relevant for our application.Again, possible errors introduced by grid averaging are explored in Appendix A.
We plot ⟨w 2 ⟩ 1/2 and its error as a function of r in Fig. 6.The upper left panel shows that the relative particle velocity fields produced by the UNET-V, UNET-R, and the ground truth (ATHENA++) are similar in shape, and the lower left panel shows that both models have their error bounded by ap- proximately 15% above and 7% below.For r ≳ 2.5 × 10 −2 , both models under-predict the relative velocity, while for r ≲ 2.5 × 10 −2 both over-predict.The absolute error for the UNET-R model is always less than that of UNET-V at all length scales.In particular, the errors in UNET-R's prediction are ≈ 2.5% less than UNET-V's at the smallest and largest scales.Furthermore, the region around the point where UNET-R's error curve crosses unity, especially that toward the tail of the curve, is much flatter than that of UNET-V's.This indicates the error varies less with separation in UNET-R.
The results above support our hypothesis that targeting the relative velocity aids in the network training.UNET-R can capture the radial dependence of relative velocity in the predicted particle velocity field within a maximum error of approximately 10%.Although this error is systematic and scale-dependent rather than random, we consider this to be good performance.
Comparison of relative velocity between dust particles ⟨w 2 ⟩ 1/2 for different simulation setups (see §2.2,2.3 and §6 for details).ATHENA++ represents the relative velocity computed from the simulation, UNET-R is the U-Net output trained with relative dust velocity, UNET-V is the U-Net output trained directly on dust velocity, and VG is the average gas velocity weighted by particle density (i.e. the relative velocity curve one would have obtained if the particles follow perfectly the flow).The left-most panels show the fiducial results, the other panels show the results of generalization tests.
In the figure, we see that both the relative particle velocity ⟨w 2 r ⟩ 1/2 predicted by UNET-R and UNET-V have similar shapes to the target.However, the performance of the U-Nets varies at different scales.For r ≳ 2.5 × 10 −2 both models perform similarly and predict a radial velocity around ≈ 2.5% above and ≈ 5% below the target respectively.The two models over-predict at scales r ≲ 4×10 −2 , being ≈ 10% and ≈ 12.5 larger respectively.Unlike ⟨w 2 ⟩ 1/2 , the error curves of the models do not intersect at unity.However, UNET-R's error remains smaller than UNET-V's across most scales, except for a small region around r ≈ 3 × 10 −2 .Also, while both UNET-R and UNET-V overpredict the radial velocity at small scales, that of UNET-V is much closer to the curve labeled VG-the average gas velocity weighted by particle density-than UNET-R.This indicates that UNET-V is essentially outputting the gas velocity v g as the particle velocity at small scales, instead of learning the particles' deviation from the gas flow.This further demonstrates that training on v rel is effective in helping the network to single out how the particle velocity differs from the background flow.
Finally, unlike the relative velocity, which shows a flat part at intermediate scales, there are no obvious flat regions on UNET-V's error curve.This suggests the network in general predicts the direction of v g less accurately than its amplitude.

Error estimation
The predicted particle density and velocity fields, and their associated statistics, show good but not perfect agreement with the ground truth derived from the simulation data.The errors are systematic and scale-dependent.It is of interest to ask: how much of this error is due to uncertainty in the deep learning model?Addressing this question is difficult, and various approximate methods have been used in the machine learning community.We have experimented with two popular frameworks, (i) Monte-Carlo Dropout (Gal & Ghahramani 2016;Zhu & Laptev 2017) and (ii) stochastic weight averagingGaussian (SWAG) (Maddox et al. 2019).Details are given in Appendix C. In summary, we find evidence using the SWAG approach that model uncertainty has a larger effect on the velocity predictions than on the density predictions.However, the magnitude of the estimated model uncertainty is significantly smaller than the size of the discrepancy between the model predictions and the ground truth.The source of the discrepancy thus remains unclear, but it is clearly of a systematic nature.

GENERALIZATION TESTS
Comparison of relative radial velocity ⟨w 2 r ⟩ 1/2 for different simulation setups (see §2.2,2.3 and §6 for details).Refer to Fig. 6 for the description on the abbreviated labels.
Our deep learning model, trained on a single simulation of forced turbulence with a specified set of parameters, cannot be expected to generalize to distinctly different regimes of particle-gas coupling.However, it may have learned enough about how particles interact with turbulent fluids to behave robustly when confronted by modestly different turbulent conditions.To test our model's ability to generalize, we have designed three test cases with physical parameters different from the fiducial case discussed so far.The test cases are: (1) turbulence driven with a different range of cut off wavenumbers k min , k max ; (2) turbulence driven using a different energy injection rate of ⟨ ė⟩ = 5 × 10 −4 ; (3) particles with stopping time τ s 100 times smaller than the original simulation.ATHENA++ simulations with these different parameters, but otherwise similar to the fiducial run, were performed for each test case.
For each of the test cases, we feed their simulation frames into the UNET-R trained from the fiducial run without any additional training, and produce field predictions for ρ p and v p .Then, using the same statistics code in §4.2, §5.2 and §5.3, we compute the relevant statistics from the field outputs and compare them with the ground truth.

Different wavenumber cut-off
In this generalization set, different minimum and maximum cutoff of driving wavenumbers k are used.Instead of the original (k min , k max ) = (1, 5) combination, we remove the largest scale driving and use the combination (3, 8).The resulting gas velocity power spectrum is illustrated in Fig. 8.The dip in power at around k = 1 − 3 confirms that the new flow has less power in large scales compared to the original flow.
The RDF is computed and plotted in the second column in Fig. 4.In the figure, we notice that the U-Net predicted RDF resembles closely that of ATHENA++.Furthermore, from the ratio plot, we see the discrepancy between the two densities takes a similar shape as that produced on the original validation set.
As mentioned in §2.1, the particle clustering is most responsive to eddies with size ℓ ≈ St 3/2 ℓ η .Therefore, the absence of large-scale driving should not affect the resulting clustering.This test indicates the network can pick up the correct scales in the power spectrum responsible for the clustering.
The relative velocity ⟨w 2 ⟩ 1/2 and the radial relative velocity ⟨w 2 r ⟩ 1/2 behave similarly to the density.Due to the energy cascade inherent to Kolmogorov turbulence, it is expected that the removal of large-scale driving in this test case should not affect the small-scale relative velocity.Therefore a similar target relative velocity profile is expected.This is verified in the second plot in Fig. 6 and Fig. 7. Furthermore, the network also outputs a similar relative velocity as the target.The relative velocity of the predicted v p is relatively flat for r ≳ 2 × 10 −2 and raises for scales smaller than r ≲ 2 × 10 −2 .The "flat" part is at most ≈ 3% lower than the ground truth and the "peak" is at most ≈ 12% above the target just like that in the normal validation set.The predicted relative velocity also differ from VG by ≈ 5%, which indicates the network can discriminate gas velocities from dust velocities.This shows that the network correctly ignores the large-scale driving that is irrelevant to the relative velocity on small-scales and identifies the wave-numbers that are relevant for the problem.As a result the network is able to predict the relative velocity field with similar quality as the one it is trained on.
The radial component shows very similar character to the relative velocity.From Fig. 7, we see that at the largest scale the prediction is ≈ 1% below the ground truth and at the smallest scale it is ≈ 10% larger.However, unlike the ⟨w 2 ⟩ 1/2 predictions, the difference between the predicted radial velocity and VG is smaller in this case.Still, the predictions made by the network have similar quality as that in the fiducial case.This shows that for all statistics above, the network is indeed able to find the correct wave-numbers that are responsible for the relative velocity.

Different energy injection rate
In this generalization test, we drive the turbulence with an energy injection rate that is 10 times smaller than that of the fiducial case (⟨ϵ⟩ = 5 × 10 −3 ).As shown in §2.1, the length scale at which the strongest clustering occurs is given as ℓ τ ≈ ⟨ϵ⟩ 1/2 τ 3/2 s .With ⟨ϵ⟩ → 0.1⟨ϵ⟩, the clustering length scale should be 0.1 1/2 ≈ 0.32 times the original.Hence, the dust clustering structure becomes finer and the RDF should rise at smaller scales.Since our statistics are limited to grid resolution r ≈ 256 −1 , the RDF instead ap-pears flatter.This is reflected in the RDF of the simulated ground truth in the third subplot of Fig. 4, where the peak g(r) value is reduced from 2.25 to 1.8.The lower energy injection rates also pushes the dissipation scale to lower k.This is demonstrated qualitatively in Fig. 8, where the setup's spectral energy starts falling off at a lower k compared to the other setups.
However, the network output does not show similar scaling.As shown in the figure, starting from r ≲ 10 −1 , the network consistently underpredicts the RDF.From 10 −2 ≲ r ≲ 10 −1 , the prediction is ≲ 3% from the target.However, the deviation quickly grows for scales r ≲ 10 −2 , reaching an error of ≈ 15% at a length of ≈ 5 × 10 −3 .This shows that while the network is able to partially learn the dependency of clustering on ⟨ϵ⟩, it is unable to account for the enhanced clustering at smaller scales.We note that all predicted RDFs go to 1 as r ≳ 10 −1 .This shows that the predicted dust density indeed goes to the average density at large scale, as expected from the definition of the RDF.
Similarly, reducing the energy injection rate will affect the gas velocity scale.By Kolmogorov scaling, the gas velocity in the inertial range should scale as v g ∝ ⟨ϵ⟩ 1/3 .Since the dust is driven by the gas via aerodynamic drag, we expect the relative velocity of the particles to scale partly as ∝ ⟨ϵ⟩ 1/3 too.Therefore, reducing ⟨ϵ⟩ → 0.1⟨ϵ⟩ should scale down the velocity by a factor of 10 1/3 ≈ 2.15.This scaling is roughly reflected in the third columns of Figs.6,7.The minimum and maximum target relative velocity scale of the fiducial average relative velocity and radial velocity are ≈ 5×10 −2 −2×10 −1 and ≈ 1.3 × 10 −1 − 3 × 10 −1 respectively, while that in this generalization test is ≈ 2×10 −2 −8×10 −2 and ≈ 9×10 −2 − 2×10 −1 respectively.Therefore, the target relative velocities are all smaller than the fiducial case by approximately the expected factor, and for small scale the difference roughly corresponds to the scaling we derived above.However, the network does not display similar scaling.As shown in Fig. 6, the network always over-predicts the relative velocity by a factor of at least ≈ 10%.Furthermore, the deviation grows steadily as the scale gets smaller, peaking at ≈ 40% error at the smallest scale.
Analogous results is shown in radial velocity statistics in Fig. 7.At all scales, the network over-predicts the statistics by at least 6%.This deviation continues to grow for smaller r and an ≈ 20% overshoot in radial velocity at grid-scale is observed.Furthermore, comparing the VG line and the network output in Fig. 6 and Fig. 7, we see that the predicted particle velocities are much higher than what would have been obtained if all dust was perfectly following the flow (VG).Hence, the network is likely predicting particle relative velocities that correspond to simulations with a higher velocity scale.This reflects the network's inability to extract correct injection energy scales from the gas velocity field.While this is a failure of the model, it is expected because the network is trained only with one energy injection rate.6.3.Different particle stopping time τ s Finally, we briefly discuss the results of a more challenging generalization test, in which we changed the dust stopping time τ s for the particles to 0.001, a hundred times smaller than that of the original simulation.On the other hand, the gas spectrum remains largely unchanged (See Fig. 8).Since ℓ τ ≈ τ 3/2 s ϵ 1/2 , this means the particle clustering scale will be scaled down to around 0.1% of the original flow.Again, by the same reasoning as §6.2, a flat ground-truth RDF is expected.This is reflected in the rightmost plot in Fig. 4. The violet line, which represents the RDF of the target flow, is nearly flat and only deviates slightly from 1 at r ≲ 10 −2 .The strong coupling means that the velocity of the particles in the simulation follows that of the gas closely at all scales.
In contrast to the less challenging generalization tests presented above, the large change in stopping time leads to a catastrophic failure in the prediction of the particle RDF.The results for the different velocity statistics, while not wrong by a similarly large factor, are systematically lower than the ground truth at all scales (See Fig. 6 & 7).These failures are to be expected because τ s is a physically important parameter that our model has no knowledge of.They illustrate the importance of including simulations with different stopping times as part of the training set for future models.

DISCUSSION
In this work, we have quantified the ability of the U-Net machine learning architecture to predict the clustering of particles that are aerodynamically coupled to turbulent gas.Training data was generated using a recently developed particle module for the ATHENA++ code, with the included physics and parameters chosen to be appropriate for the problem of small-scale turbulent clustering of dust in protoplanetary disks.After training, using a loss function that targets the three-dimensional particle density and velocity fields, we find that the U-Net model yields satisfactory performance for gridded representations of the statistical metrics that are of greatest interest for planet formation-the Radial Distribution Function and particle relative velocity.Errors on these quantities are scale-dependent, but typically at the percent level.Smaller maximum errors are found for density rather than velocity statistics.The model also generates, of course, a fast approximate mapping from the fluid to the particle fields in three dimensions.This is a novel capability that is not, to our knowledge, possible using analytic methods.
The use of a U-Net architecture requires a gridded representation of the particle density, and the definition of a gridded (single-valued) velocity field.These characteristics lead directly to some of the limitations observed in our study.The Lagrangian representation of particles in the training simulations yields useful information on particle statistics on scales smaller than those of the fluid grid, which is discarded as a pre-processing step in our approach.For applications where the sole interest is the statistical properties of the particles, better results extending to smaller scales would likely be obtained by using a convolutional neural network to target directly the radial distribution and relative velocity functions.
The model fails to generalize to different friction times and energy injection rates, as expected given that these are key physical parameters and the network was only exposed to a single value of them.There are no obvious obstacles to training the network on a suite of simulations and thereby learning the dependence of particle clustering on physical input parameters (see, e.g.analogous work in cosmology; Villaescusa-Navarro et al. 2021).The combination of threedimensional spatial prediction of particle positions, together with the prediction of the particles' velocity relative to the gas, would then allow the calculation of particle collision rates and outcomes for arbitrary size distributions.Different approaches would be needed to incorporate the back-reaction of particles on the gas, an aspect ignored in our work but which is central to models of planetesimal formation (Youdin & Goodman 2005;Simon et al. 2016).Learned simulators for turbulent flows show considerable promise (Dang et al. 2022), and it would be interesting to apply similar methods to multi-phase fluid turbulence.The existence of new instabilities that rely on the two-way coupling between gas and particles (Youdin & Goodman 2005;Squire & Hopkins 2018) suggests that multi-phase turbulence would be a stringent test of such methods.
The work presented here explores arguably the simplest problem in two-fluid turbulence that has relevance to planet formation.Related problems are present on multiple other scales in protoplanetary disks and in other domains, including atmospheric science and experimental studies of turbulence.Our results suggest that deep learning offers a complementary approach to direct numerical simulations that can predict particle clustering at a lower computational cost.In MC dropout, dropout layers are added before every weight layer with a dropout probability p. Layers with dropout drop weights randomly with a probability of p. Gal & Ghahramani (2016) has shown that by dropping out at validation time and averaging out the results, we can approximate the first moment of the predicted distribution as, where ŷi is the output of the i-th validation run using the input x and T is the number of forward passes used during validation.
For the second moment, we compute the variance of the predicted distribution as, where σ model is the model uncertainty and σ inherent is the data noise uncertainty.Since we want to focus on model uncertainty, we set σ inherent to zero, where ŷi and ŷMC take the same meaning as above.We trained 4 networks with dropout rates p = 0.1, 0.3, 0.5, 0.7 and sampled 100 times for each network.The particle statistics of the generated frames are presented in Fig. 12.We find, first, that the predicted mean of the dropout networks becomes less accurate when p increase.Second, the model fluctuations, represented by the color bars in the plot, are too small to be observed even with p = 0.7.This indicates that the results of the model are precise but not accurate.

C.2. SWA-Gaussian (SWAG)
We also performed sampling using another popular method: SWA-Gaussian (Maddox et al. 2019).In the method, we sample the model parameters near to those of the pre-trained network using a high constant learning rate.The collected model weights are then averaged to get the average weight θ SWA using a simple average.The low rank and diagonal approximation of the weights' covariance Σ low , Σ diag are also computed.This results in a Gaussian approximation of the posterior over weight parameters, which can be sampled to perform a Bayesian model average.In our case, we trained our SWAG network using a constant learning rate of lr = 0.02 for a few hundred epochs.The extracted θ SWA , Σ low , Σ diag are then used to generate 20 samples, which are used to give a model averaged RDF, relative velocities and their error bars.
We randomly selected 5 samples generated by SWAG and plotted slices of the output particle density ρ p and particle relative velocity field along the x-direction in Fig. 13.As shown in the figure, all ρ p fields generated by SWAG are of similar shape and have similar features.This suggests the model uncertainty of ρ p generated is small.On the other hand, two of the five slices of the output v x rel differ significantly both from the other slices and from the true output field shown in Fig. 5.This indicates a larger model uncertainty in the generated relative velocity field.
The particle statistics are presented in Fig. 12. From the figure, we see that the fluctuation predicted by SWAG is much larger than MC dropout.Hence, the error bars predicted by the two frameworks are not consistent.In agreement with what we see visually in output slices, we find that the uncertainty in velocity statistics as estimated using SWAG is much larger than that of particle density ρ p .However, even with larger error bars, none of the error bands are large enough to mask the discrepancy between the prediction and the output.Therefore, we conclude that the discrepancy between ML predictions and target is systematic in both SWAG and MC-Dropout.
In all statistics, the errors of the predicted mean using MC dropout are always larger than that of SWAG.This is expected, because by dropping out parameters randomly during training and validation, we effectively average a fraction of parameters in the model to take similar values.This should reduce the effective model size of MC-Dropout and reduce its expressive power.

Figure 1 .
Figure1.The architecture of the U-Net.The thickness of individual layers corresponds roughly to the number of channels in each layer and the width and height corresponds to the spatial size.The color scheme corresponds to: convolution layers of kernel size 3 and stride 1 (orange), Leaky ReLU activation with batch normalization (deep orange), downsampling layers using convolution of kernel size 2 and stride 2 (red) and upsampling layers using transposed convolution of kernel size 2 and stride 2 (purple).Residue connection is implemented for all convolution layers except those used in up and down sampling.Similar to other U-Net architectures, large scale information is incorporated by narrow down and concatenating compression layer outputs with the upsampled outputs.

Figure 2 .Figure 3 .
Figure2.Particle density field slice ρp from ATHENA++ (left), UNET-V (center, a network trained on the relative velocity between particles and gas) and UNET-R (right, a network trained on the absolute particle velocity), for a typical frame.Qualitatively, there is good agreement between the predicted structures and the ground truth, though the network outputs are visually less sharp than the target simulation data.

Figure 4 .
Figure 4. Predicted versus simulated strength of particle clustering as a function of spatial scale, quantified via the radial distribution function statistic (RDF).From left to right, the RDF plots show different simulation setups.The left-most panels show the fiducial setup, while the remaining panels show the results of generalization tests.(See §2.2,2.3 and §6 for details.) ATHENA++ shows the RDF computed directly from the simulation, UNET-R is the U-Net output trained with relative velocity between gas and dust, and UNET-V is the U-Net output trained directly on dust velocity.The first row compares the RDF of the simulation output and the U-Net prediction.The second row shows the ratio between the outputs and the target to better quantify the error.The zoom-in plot in the lower-right panel shows the peak of the error at small scales.

Figure 5 .
Figure5.Prediction of the spatial distribution of particle relative velocity (vg − vp), in domains of isotropic forced turbulence.The figure compares the ground truth, obtained with ATHENA++, with predictions from two deep learning models: UNET-V trained directly with particle velocity vp, and UNET-R with relative velocity v rel .The columns represent different setups (ATHENA++, UNET-V, UNET-R) and the rows show the 3 components of the velocity field.Each panel is a slice through the three-dimensional domain.

Figure 8 .
Figure 8. Compensated energy spectrum of driving flow vg for different simulation setups.The color bands illustrate the range of k used in driving the flow for each case and the line shows the resulting power spectrum of the gas kinetic energy.

Figure 11 .
Figure 11.Consistency of particle statistics across frames.We compute the statistics for the entire validation set and plot the median as the solid lines.The lower and upper edge of the color band represents the 5th percentile and 95th percentile of the distribution.Therefore, 90% of the values computed across the validation set lie within the band.

Figure 12 .
Figure 12.Comparison between SWAG output and MC Dropout outputs with different dropout probabilities.Samples generated by the networks are used to generate the statistics, and the distribution is plotted.The solid lines are the median, and the upper and lower portion of the color bars are the upper and lower quantile of the distribution.Note that the color bars for MC dropouts are all too small to be visible on the graph.