Adversarial reverse mapping of equilibrated condensed-phase molecular structures

A tight and consistent link between resolutions is crucial to further expand the impact of multiscale modeling for complex materials. We herein tackle the generation of condensed molecular structures as a refinement—backmapping—of a coarse-grained (CG) structure. Traditional schemes start from a rough coarse-to-fine mapping and perform further energy minimization and molecular dynamics simulations to equilibrate the system. In this study we introduce DeepBackmap: A deep neural network based approach to directly predict equilibrated molecular structures for condensed-phase systems. We use generative adversarial networks to learn the Boltzmann distribution from training data and realize reverse mapping by using the CG structure as a conditional input. We apply our method to a challenging condensed-phase polymeric system. We observe that the model trained in a melt has remarkable transferability to the crystalline phase. The combination of data-driven and physics-based aspects of our architecture help reach temperature transferability with only limited training data.


I. INTRODUCTION
Computational modeling of soft-matter systems inherently requires the consideration of a wide range of time and length scales, where microscopic interactions can impact meso-to macroscopic changes. 1 Setting aside quantum mechanics, even molecular dynamics (MD) quickly reaches its limits when probing long relaxation times.Circumventing such limitations remains an area of active research, motivated in part by the promises of computational soft materials discovery. 2Various strategies aim at breaking the natural limitations of MD, from enhancedsampling techniques 3 to dedicated hardware 4 to hierarchical multiscale modeling. 1,5,6ultiscale modeling relies on several levels of resolution, striving to make best use of each level.At the lower end, a coarse-grained (CG) resolution will map groups of atoms to single interaction sites or beads.8][9] The reduced representation eliminates some molecular friction, smoothens the energy landscape, and thereby effectively accelerates sampling of the conformational space.
While mapping from fine to coarse is straightforward, going the reverse way is no trivial task.Backmapping means reintroducing lost degrees of freedom: from CG beads to atoms.The reduced CG resolution implies that one CG configuration will correspond to an ensemble of atomistic microstates.Ideally, the CG model should perfectly reproduce the Boltzmann distribution of the atomistic system along the CG degrees of freedom-the manybody potential of mean force.As such, backmapping aims at generating an atomistic structure drawn from the probability distribution of atomistic microstates, given the CG configuration.atomistic structures conditional on the coarse-grained configuration using an adversarial network.We apply it to the backmapping of a condensed-phase molecular system made of polystyrene chains.
The general strategy of existing backmapping schemes is to insert an initial set of atomistic coordinates into the coarse-grained structure. 10Two major approaches are random placement of the atoms close to their corresponding coarse-grained bead center 11,12 or inserting presampled fragments from a correctly sampled distribution of all-atom structures. 1,13,14In both cases energy minimization is required to relax the initial atomistic configuration arXiv:2003.07753v1[cond-mat.soft]17 Mar 2020 and a subsequent molecular dynamics simulation has to be performed to equilibrate the system to obtain the correct Boltzmann distribution.
The computational cost for the subsequent minimization and equilibration procedures can become significant for high-dimensional systems.This is also true for backmapping large numbers of coarse-grained configurations. 15Furthermore, generating the initial atomistic structure often requires human intuition to avoid trapping in local minima.For example, the protocol of Wassenaar et al. needs to introduce geometric modifiers to correctly reproduce the distribution of torsion angles in phospholipids. 12n this work we introduce DeepBackmap, a backmapping scheme based on deep convolutional neural networks (deep CNNs).We bypass computationally expensive energy minimization and molecular dynamics simulations by predicting equilibrated atomistic structures directly from the coarse-grained configuration.This is achieved using generative adversarial networks (GANs), [16][17][18] a particular type of generative model based on deep networks: During training, an auxiliary critic learns a distance metric between generated and training data.While the critic is trained to maximize the distance, the objective of the generator is to minimize it.
The seemingly unintuitive training protocol of GANs circumvents the hard problem of fitting the posterior distribution to training data.Instead of explicitly learning the distribution, they only tune a sampler (the generator) to produce samples indistinguishable from the training distribution (for the critic).For high-dimensional data sets, such as the joint distribution of many atoms in molecules, previous methods become either intractable or lose resolution, dependencies, or both.
To extend GANs to a conditional model, an auxiliary input can be introduced to both, the generator and the critic, which is taken to be the conditional variable. 19,20ere we use conditional GANs to learn a coarse-to-fine mapping that re-introduces degrees of freedom with the correct statistical weight.To this end, we use the coarsegrained structure as an auxiliary input.
Generating low-energy geometries for molecular compounds remains a challenge that is still tackled largely by MD simulations.Recent approaches using machine learning (ML) include autoregressive models, 21,22 invertible neural network, 23 Euclidean distance matrices, 24 and graph neural networks. 25ur study uses a convolutional GAN, which has shown the ability to model highly complex and detailed probability distributions (statistical dependency structures) in computer vision applications. 26However, it requires a regular discretization of 3D space, prohibiting scaling to larger spatial structures.We therefore combine the convolutional generator with an autoregressive approach that, in an outer loop, reconstructs the fine-grained structure incrementally, atom by atom.In each step, we use only local information, making the method scalable to arbitrary system sizes.Our method can be used to gener-ate near-equilibrium configurations for condensed-phase systems.
Backmapping molecules in vacuum can be relatively straightforward, but the challenge is to achieve it in a condensed phase.We test our approach on a dense polymeric system: syndiotactic polystyrene (sPS).8][29] An illustration of the coarse-grained and the atomistic representation of the molecule can be found in Fig. 1.When trained solely on data obtained from a high-temperature melt, the model is transferable to lower temperatures where the system is in a crystalline phase.This indicates that the microscopic degrees of freedom learned by the model have weak temperature dependence and can be generated solely from large-scale features captured in the coarse-grained structure.

II. MACHINE LEARNING MODEL
In the following, we discuss our approach.We start with a description of the molecular simulation scenario our method handles, and then discuss in detail how the deep backmapping algorithm works.

A. Setup
We define notation for the coarse-grained and atomistic resolutions, as well as the backmapping procedure: Coarse-grained resolution: Let {A I = (R I , C I )|I = 1, . . ., N } denote the set of N coarse grained beads.Each bead has position R i ∈ R 3 and bead type C i .
Atomistic resolution: Let {a i = (r i , c i )|i = 1, . . ., n} denote the set of n atoms, with position r i ∈ R 3 and atom type c i .We denote ϕ I ⊂ {a i |i = 1, . . ., n} as the set of atoms contained in the coarse-grained bead A I .
Backmapping: Backmapping requires us to generate a set of n atom positions r 1 , . . ., r n conditional on the coarse-grained (CG) structure, given by the N beads A 1 , . . ., A N , as well as the atom types c 1 , . . ., c n .We express this problem as a conditional probability p(r 1 , . . ., r n |c 1 , . . ., c n , A 1 , . . ., A N ).
We now propose a machine learning (ML) technique that takes examples of corresponding coarse-and fine-grained examples as input and from this training data learns the conditional distribution p.Specifically, we do not learn p directly, which is well-known to be a hard problem for high-dimensional phase spaces, 16 but rather infer a sampler that can generate further samples from p, see Fig. 2.  A N ) directly, we propose to factorize p in terms of atomic contributions, where the generation of one specific atom becomes conditional on both CG beads as well as all the atoms previously reconstructed. 22Based on this factorization we can train a generative network, G, to generate and refine the atom positions sequentially.
The backmapping scheme hereby consists of two steps: (i) An initial structure is generated using the factorization where S sorts the atoms in the order of reconstruction and {r S(1) , . . ., r S(i−1) } correspond to atoms that have been already reconstructed.The dependence on earlier predictions of G makes our approach autoregressive.This procedure would be exact in a Markovian regime where each atom interacts directly only with its predecessor and successor (so-called "chain structures" 30 ).Unfortunately the complexity of condensed-phase liquids calls for more feedback to avoid steric clashes; (ii) Intuitively, we cannot optimally place an atom without its whole environment present.This issue is compounded for ring-like structures, like the phenyl group in polystyrene.To this end we perform a variant of Gibbs sampling, which iteratively resamples along the sequence S several times. 31ach further iteration still updates one atom at a time, but uses the knowledge of all other atoms.Experiments confirmed that such Gibbs sampling leads to a good approximation of p, even with a small number of iterations and fixing the atom ordering.

C. Representation
Iterative sampling algorithms, such as the Gibbs sampler, have high computational cost.We hereby optimize our approach by means of a robust learning algorithm that can capture complex dependencies in the local environment directly.
The problem of learning complex, high-dimensional and high-order dependencies in generative models has re-ceived considerable attention in computer vision.The most successful technique for this task are generative deep convolution neural networks 32 (deep CNNs) trained by adversarial training. 16,26There is also growing evidence that deep networks are also effective in capturing the statistics of physical systems. 22,23ensemble?
In order to leverage deep CNNs for our task, an explicit spatial discretization of the ambient space, similar to pixels in an image, is required.The standard technique is to use a voxel-based representation. 33To this end, we represent atoms and CG beads with a smooth density, γ(x) and Γ(x), respectively.
The particle densities are modeled using Gaussian distributions, such that for atom i we define where x is the spatial location in Cartesian coordinates, expressed on a discretized grid due to the voxel representation.The density is centered around particle position r i with Gaussian width σ, treated as a hyper parameter.CG beads are similarly represented.

Locality
The high costs of large regular 3D grids are the reason for employing deep CNNs only locally and using the previously described outer loop to build-up larger structures incrementally using autoregressive sampling.To make the model scalable to large system sizes, we assume locality by limiting the information about the environment to a cutoff r cut .
We encode the local environment of an atom i or CG bead I by means of the density of particles placed around it, denoted ξ i,I and Ξ I , respectively.We sum over all atoms or beads within a cubic environment of size 2r cut .We shift all atom and bead positions around the CG bead of interest, I. Further, we rotate the local environment to a local axis system.This improves generalization from limited training examples by removing three translational and two of the rotational degrees of freedom, i.e., the ML algorithm does not need to learn the corresponding coordinate invariance from (additional) examples.
Specifically, we align the bond between consecutive CG beads I − 1 and I to the local z axis using a rotation matrix M I to construct the local environment of atom i which extends over the region −r cut < x α < r cut and α runs over the three Cartesian coordinates.Similarly the coarse-grained environment is constructed as In this work we set r cut = 6 Å, such that several CG beads are included in the local environment (see Fig. S2).
Importantly, ξ i and Ξ I are discretized on a regular grid.

Feature embedding
A CNN takes an image (typically 2D or 3D) as input where every pixel or voxel is vector-valued.For example, an RGB image consists of three feature channels: One channel for every primary color.Here, we store a number of feature channels in each voxel that represent the presence of other atoms or beads of a certain kind.In the most basic version, we could use a single feature channel to encode all other atoms.However, this would make it impossible to distinguish their type and might also lead to clutter.The opposite extreme would be to assign a separate feature channel to each atom.The downside here is not only increased memory costs but, more importantly, the loss of permutation invariance of the atoms.
As shown in Figure 3a, we create separate feature channels for each atom type.Atom types are distinguished not only by element but additionally by chemical similarity, i.e., atoms of a given type can be treated as identical in the MD simulation.Specifically, we classify similarity following the force field for sPS by Mueller-Plathe. 34For atoms of the same type, we further add channels to distinguish the functional form of interaction to the current atom of interest.Interaction types distinguish between bond, bending angle, torsion, and Lennard-Jones.Similarly, we use separate channels to encode the different coarse-grained bead types.
Formally, let f ∈ {1, 2, . . ., N F } denote the index of the N F different feature channels.We define the activation function, h f (S(j)), to denote association with a channel f and H f (J) to similarly encode the bead type.We then build a density map for each channel for both atomic environments and coarse-grained environments

D. Generative model
Training a generative model is challenging as it requires to measure and optimize closeness of the target distribution and the generated distribution of the model.A direct maximum likelihood training, where the model's parameters are tuned such that the likelihood of observing the data given the model is optimized, is infeasible in high dimensions because the normalization factor-the partition function-cannot be computed efficiently.
Approaches to circumvent these limitations include approximate techniques like variational autoencoders, where a stochastic lower bound of the log-likelihood is optimized.Another solution are likelihood-free methods, such as adversarial training, 16 that operate indirectly, by building a sampler and comparing its output to actual data with a second, "adversarial" network.In recent literature, this approach appears to yield the strongest results, in particular on high-dimensional and hard to model image spaces. 26The next best option are autoregressive models, which tackle the complexity issue by learning single decisions at a time. 21We use this approach in the outer loop but employ the more expressive GAN for modeling the local placement of atoms.
Formally, to perform adversarial training, a second network is introduced, called critic C, to distinguish between training samples and samples from the generative model G.The generator competes with the critic C and is trained to generate samples that C can not distinguish anymore from training samples.In the conditional adversarial framework 19,20 both networks G and C are provided with auxiliary information like a class label to generate samples related to this information.In this study, we use a conditional generative adversarial network (cGAN) to generate new atom positions from a random noise vector z ∼ N (0, 1) and the conditional input u i := {ξ i,I , Ξ I , c i } consisting of the local environment representation ξ i,I and Ξ I , as well as the current atom type c i .In a first step, the generator G predicts a smooth-denisty representation γi := G(z, u i ).

From densities to coordinates
While the smooth-density representation γi is adequate for a CNN, we ultimately wish to collapse these back to point coordinates.We simply compute a weighted average, discretized over the voxel grid This step is performed for each generated density separately, one atom at a time.We note that this densitycollapse step is differentiable and can thus be easily incorporated in a loss function.

Training
Training of a GAN model is split in two networks: the adversarial critic and the generative network.The following describes the two loss functions.
We train a critic network C to distinguish between reference densities γ i related to the conditional input u i = {ξ i,I , Ξ I , c i } and generated densities γi = G(u i , z).The critic aims at both (i) distinguishing reference from generated samples and (ii) ensuring smoothness of the classification with respect to the generator's parameters.Both criteria can be fulfilled using a variant of adversar-ial models where the critic C is used to approximate the Wasserstein distance. 17he loss function is constructed using the Kantorovich-Rubinstein duality, which requires C to be constrained to the set of 1-Lipschitz functions.A differentiable function is 1-Lipschitz if and only if it has gradients everywhere with norm at most one.A soft version of this constraint is enforced with a penalty on the gradient norm 18 where (u i , γi ) is interpolated linearly between pairs of points (u i , γ i ) and (u i , G(u i , z)).The prefactor λ gp scales the weight of the gradient penalty.
For the generator we combine two aspects to help generate faithful structures: (i) the critic that compares reference and generated samples, C (u i , G(u i , z)), and (ii) a physical prior, Φ. Φ aims at accelerating convergence by helping the generator refine its output.It combines both force-field-based energy contributions, E FF , and a geometric center-of-mass distance contribution, d COM .The prior depends on the set of atoms corresponding to a coarse-grained bead, ϕ I for reference atoms and φI for generated atoms, as well as reference atoms N I in the local neighborhood of different beads: The force-field-based term penalizes discrepancies between samples with respect to specific intra-and intermolecular interactions within all neighborhoods N I .
where t runs over the interaction types: intramolecular bond, angle, and dihedral, and non-bonded Lennard-Jones.The set of interactions follow the reference atomistic force field.In the following, let θ I = {i|a i ∈ ϕ I } be the set of atom indices for atoms contained in ϕ I .The second term in the physical prior penalizes discrepancies in the center-of-mass geometry between samples where g refers to the center of mass with m i being the mass of atom a i .Overall this leads to the following loss function for the generator where the prefactor λ Φ scales the weight of the physical prior.
The two loss functions, L C and L G are trained iteratively and alternatingly until the process reaches equilibrium.

Implementation details
We choose a 3D convolutional neural network (CNN) architecture with residual connections for G and C. 35 See Fig. S6 for a detailed network description.
The model is trained for 38 660 iterations in total using a batchsize of 36.For stability reasons, we start training with λ Φ = 0 and increase it smoothly to λ Φ = 0.01 from step 6000 to 10 000.Training is performed using the Adam optimizer.The prefactor scaling the weight of the gradient penalty term is set to λ gp = 0.1.To obtain reliable gradients for the generator, the critic should be trained until optimality.Therefore the critic C is trained five times in each iteration while the generator G is trained just once.
We train the model recurrently on atom sequences containing either all heavy (carbon) or light (hydrogen) atoms corresponding to a single coarse-grained bead.
During training, the initial atomistic environment representation ξ i,I for each sequence is generated from training data and contains the atoms present (according to the order S) in the local neighborhood N I of bead I.After each step, the generated atom density is added to the local environment representation for the next atom in the sequence, as illustrated in Fig. 3b, untill all atoms of the sequence are generated.
In the Gibbs-sampling step, information of all preceding and subsequent atoms is used to refine the positions of light atoms.On the other hand for heavy atoms we remove hydrogens from the current and adjacent beads such that misplaced hydrogens will not hinder G to find suitable positions for the heavy atoms.
Note that our architecture is not fully rotational equivariant as it only aligns the region considered by the generator according to the position of the central bead and the difference vector to the previous bead.This leaves one rotational degree of freedom around that axis; therefore, we augment the training set using rotations about that axis.During prediction we feed different orientations about said axis as well and choose the structure with the lowest energy from the generated ensemble.

A. Reference data
The atomistic data in this study was reported in Liu et al.; 29 the underlying force field is based on the work of Mueller-Plathe. 34Replica Exchange MD simulation, a temperature-based enhanced sampling technique, was used to sample the system.All simulations were performed using the molecular dynamics package GRO-MACS 4.6. 36Molecular dynamics simulations are performed in the NPT ensemble using the velocity rescaling thermostat and the Parrinello-Rahman barostat.An integration timestep of 1 fs is used.For additional details regarding the simulations the reader is referred to the work of Liu et al. 29 Our training/test data consists of pairs of corresponding fine-and coarse-grained snapshots.To this end, we start from the atomistic frame and apply a fine-to-coarse mapping to obtain the coarse-grained structures.We use uncorrelated snapshots from three different trajectories simulated at T = 568 K, 453 K, and 313 K.The system includes 36 polystyrene chains and each chain consists of 10 monomers.
The fine-to-coarse mapping is based on the coarsegrained model developed by Fritz et al. 28 It represents the coarse-grained molecule as a linear chain, where each monomer is mapped onto two CG beads of different types, denoted A for the chain backbone and B for the phenyl ring (see figure 1).Bonds are created between the backbone beads A-A and between backbone and phenyl ring beads A-B.The coarse grained model, parameterized in the melt, is transferable to the crystalline phase and stabilizes the experimentally observed α and β polymorphs. 29

B. Baseline Model
We compare our results with a generic backmapping scheme developed by Wassenaar et al. 12 This method places each particle on the weighted average position of the coarse grained beads it belongs to and optionally adds a random displacement.The protocol continues with corrections to the structure using geometric modifiers, setting the alignment of the next particle as cis, trans, out, or chiral with respect to the others.Note that those modifiers need first be manually defined by the user.
The corrected structure is then relaxed by a force-field based energy minimization.The first cycle of energy minimization consists of 200 steps and is performed with non-bonded interactions turned off.The second cycle of energy minimization consists of 5000 steps with all interactions turned on.Clearly the energy minimized structures will not capture the right Boltzmann distribution and therefore the protocol of Wassenaar continues with several cycles of position restrained molecular dynamics simulations.Since we aim for a backmapping scheme that performs well without running molecular dynamics simulation, we stop the protocol after the energy minimization and compare the methods without running any further molecular dynamics simulations.

IV. RESULTS
We apply DeepBackmap to a challenging condensedphase molecular system: syndiotactic polystyrene.Despite its simple chemical structure, polystyrene displays a rich conformational space.Its syndiotactic form can crystallize, and exhibits complex polymorphic behavior.Upon thermal annealing, a polystyrene melt undergoes a phase transition from amorphous to a crystalline phase at T ≈ 450 K.The CG model was shown to stabilize the two main crystal polymorphs α and β (see Fig. 4). 29

FIG. 4.
Polymorphism of Polystyrene.At high temperature (T = 568 K) the system stabilizes an amorphous phase.At lower temperatures the CG model mostly stabilizes the α polymorph at T = 453 K and the β polymorph at T = 313 K.We train DeepBackmap solely on the high-temperature ensemble (T = 568 K) and test its transferability to the lower temperatures.
We probe the model's ability to transfer across temperatures.To this end, we train DeepBackmap solely on high-temperature, amorphous configurations, but validate it at several temperatures (Fig. 4).The training set consisted of only 12 snapshots simulated at T = 568 K.The model was then applied to MD configurations at T = 568 K, 453 K, and 313 K, each containing 78 samples that were not used during training.For brevity we only report results about the highest and lowest temperature.We evaluate the performance of the model regarding its ability to reproduce structural and energetic features of the reference atomistic configurations, as well as a comparison with the baseline method.
A. Local structural and energetic features Fig. 5 shows distribution functions for structural and energetic properties.We first analyze the hold-out validation data at T = 568 K (right column), the temperature at which DeepBackmap was trained on.Our method generates configurations that are remarkably close to the reference Boltzmann distribution ("AA"), especially when considering the current state of the art.The distributions of intramolecular carbon backbone angle and dihedral show very good agreement (Fig. 5a-d).On the other hand, the baseline method displays too narrow distributions and spurious peaks.While the distribution for the carbon improper dihedral of the aromatic structure is slightly too narrow, we emphasize the small range of angles (Fig. 5e-f), due to the imposed planarity of the ring.The baseline method significantly suppresses fluctuations around the planar structure.The Lennard-Jones energies shown in Fig. 5 (g-h) obtained for each chain separately also match remarkably well with the reference distribution-this aspect is of tremendous importance to generate well-equilibrated structures in a condensed environment.We do observe slightly large high-energy tails, often due to an accumulation of errors of misplaced atoms impacting the subsequent placement of neighbors in our autoregressive approach.On the other hand, the baseline model systematically and drastically over-stabilizes the system.This results from the energy-minimization scheme, which fails to prepare the structure for a specific canonical state point.For this reason, state-of-the-art backmapping schemes require extensive MD simulations, including lengthy heating procedures and thermostat/barostat equilibration, before offering a starting point for a production run.

B. Transferability to low temperatures
While we fix the original training of DeepBackmap to the high-temperature ensemble, we hereby test it at low temperature (T = 313 K), without reparametrization.Beyond a mere shift in temperature, the system undergoes a phase transition, going from an amorphous phase to a crystalline state with different polymorphs.The distributions in Fig. 5a-g (left column) show remarkably accuracy: DeepBackmap retains its performance displayed for the training temperature.Upon cooling the distributions do show a number of significant changes: narrower distribution in the angle, vanishing of the side peak in the backbone dihedral, and large shift of the Lennard-Jones energies.
The transferability of DeepBackmap is highlighted when compared to the baseline model, which retains much of its features found at high temperature.This is especially apparent for the side peak of the backbone dihedral.

C. MD simulation
Backmapped structures are often used as starting points for further MD simulations.State-of-the-art backmapping schemes rely on lengthy preparations to offer a starting point for a production run, such as a heat-up phase and thermostat/barostat equilibration.Fig. SI.7 displays the evolution of the potential energy during MD simulations without heat-up at T = 313 K starting from structures generated with the different methods.Initial velocities are generated according to a Maxwell distribution.The evolution of the potential energy of structures generated with DeepBackmap follow closely the evolution of reference AA structures.The potential energy reaches a steady value after 100 ps.On the other hand, energy minimized structures from the baseline method settle at significantly higher energies indicating badly initialized structures that get trapped into local minima with high energy barriers.

D. Large-scale structural features
To further evaluate the large-scale structural features, we turn to the pair correlation function, g(r).Fig. 5i-j focuses here on non-bonded carbon pairs.We can see an excellent agreement between the reference AA g(r) and the DeepBackmap results for both temperatures.This clearly indicates that the local packing of the polystyrene chains is well reproduced, even for different state points that were not used during training.As expected the baseline method does not reproduce the pair correlation satisfyingly, especially fails in the crystalline phase.
Beyond the pair statistics, we wish to probe the accuracy of the reconstruction at higher order.We build a two-dimensional map representing proximity relationships between condensed-phase structures.We focus on the local environment around each backbone carbon atom that directly links to a side chain (i.e., every other backbone carbon).The pairwise distance between two such environments is encoded using a similarity kernel based on SOAP representations. 37Hydrogens are ignored from the representation.To compare two structures we compute a covariance matrix containing all the pairwise distances between atomic environments, followed by a regularized entropy match kernel. 38We further apply Sketchmap to obtain a reduced-dimensional projection of the conformational space. 39,40ig. 6 displays a number of clusters that correspond to different environments.The low-temperature reference data shows a single cluster (Fig. 6a in black), corresponding to the β phase, while the high-temperature reference shows more diversity (i.e., α, amorphous phase, and others).DeepBackmap overlaps significantly with the reference points at both temperatures, highlighting the high structural fidelity.This is not the case for the energyminimized structures of the baseline model, as they cover different areas of the low-dimensional map.The baseline also fails to reproduce the correct number of clusters: at both temperatures the baseline model displays three to four clusters, highlighting a lack of temperature sensitivity.

V. CONCLUSIONS
In this study we propose a new backmapping scheme based deep neural networks.The model inserts atomistic details based on large-scale structures from a coarsegrained snapshot.To this end we use a conditional generative network where the coarse-grained information is used as an auxiliary input.We train our model, DeepBackmap, combining an adversarial loss function with a physical prior.The method is scalable to arbitrary system sizes since only local information is used.Our method is able to generate well-equilibrated highresolution structures of condensed-phase systems.Critically, and unlike current methods, our approach does not need molecular dynamics (MD) simulations to yield the correct Boltzmann distribution.
We applied our methodology to a complex condensedphase system made of syndiotactic-polystyrene chains.The model displays remarkable transferability properties: while trained solely on high-temperature melt configurations, DeepBackmap performs well at significantly low temperatures, where the system is in a crystalline state.This indicates that the local correlations learned by the model are transferable across different state points, aided by the physics we incorporated into the GAN.
We rationalize these remarkable features in terms of scale separation: the large-scale features are encoded in the coarse-grained configurations, while the model only need to generate equilibrated local correlations.Local features are less affected by temperature, since the underlying covalent interactions operate primarily on an energy scale significantly larger than k B T .As such the backmapping operates on two different sources of information: (i) the conditional coarse-grained configurations and (ii) the learned local correlations.Most of the temperature dependence is carried by the former, such that DeepBackmap can accurately produce an accurate Boltzmann distribution across a phase transition from training at a single temperature.
Beyond the evident advantages of generating equilibrated molecular structures, our approach offers the perspective of a tighter integration of multiscale models: The information of the coarse-grained is efficiently recycled into the higher resolution.Avoiding unnecessary equilibrations upon upscaling will help connect models at different scales-an important task at the dawn of the exascale computing era.

FIG. 1 .
FIG. 1. (a)Backmapping consists of reintroducing missing degrees of freedom from a coarse-grained to an atomistic resolution.(b) DeepBackmap generates Boltzmann-equilibrated atomistic structures conditional on the coarse-grained configuration using an adversarial network.We apply it to the backmapping of a condensed-phase molecular system made of polystyrene chains.

FIG. 2 .
FIG.2.Adversarial autoregressive approach: The generator, G, sequentially samples atom positions conditional on the CG structure and the existing atoms.A critic network, C, estimates the discrepancy between reference and generated atoms.

FIG. 3
FIG. 3. a) Representation and conditional input.Existing atoms and CG beads are split into separate channels according to their atom/bead type.In addition, the atomic information is further split in terms of intra-and intermolecular interactions.All channels are used as input for the generator network, G. b) Recurrent training.Starting from an atomistic configuration taken from training data (black) the predicted atoms (red) will be added to the local environment description for predicting the next atom in the sequence.

FIG. 6 .
FIG. 6. Low-dimensional structural space of condensed-phase configurations at (a) T = 313 K and (b) T = 568 K.For each panel, snapshots are backmapped from identical coarsegrained configurations, highlighting the overlap between reference and DeepBackmap, but disconnect from the baseline method.