Simulation-ready graphene oxide structures with hierarchical complexity: a modular tiling strategy

Graphene oxide (GO) sheet structures are highly variable and depend on preparation conditions. The use of molecular simulation is a complementary strategy to explore how this complexity influences the ion transport properties of GO membranes. However, despite recent advances, computational models of GO typically lack the required complexity as suggested by experiment. The labor required to create such an ensemble of such structural models with the required complexity is impractical without recourse to automated approaches, but no such code currently can meet this challenge. Here, a modular tiling concept is introduced, along with the HierGO suite of code; an automated approach to producing highly complex hierarchically-structured models of GO with a high degree of control in terms of holes and topological defects, and oxygen-group placement, that can produce simulation-ready input files. The benefits of the code are exemplified by modeling and contrasting the properties of three types of GO membrane stack; the widely-modeled Lerf–Klinowski structure, and two types of highly heterogeneous GO sheet reflecting differing processing conditions. The outcomes of this work clearly demonstrate how the introduction of the complexity modeled here leads to new insights into the structure/property relationships of GO with respect to permeation pathways of water, ions and molecular agents that are inaccessible using previously-considered models.


Introduction
Graphene oxide (GO) is a material of growing relevance due to its versatility in applications including biosensors [1], water filtration [2][3][4], photovoltaics [5][6][7], supercapacitors [8,9], among others [10][11][12]. In each case, the development, adaptation, and translation of new and versatile applications of GO need to be informed by structure/property relationships, provided by detailed atomic-scale structural models of the individual GO sheets. Additionally, how these individual sheets are arranged relative to each other in the condensed phase is an important, and sometimes neglected, aspect of GO. However, creation of such structural models is challenging, primarily because GO is a berthollide material; i.e. GO does not have a fixed stoichiometry, supporting different ratios of carbon, oxygen, and hydrogen (all of which can be referred to as 'GO'). This means that the term 'GO' represents a family of materials, in which GO has a complex and variable structure. Therefore, no two individual sheets of GO necessarily have an identical structure in terms of their chemical composition, the spatial arrangements of the atoms, and their chemical bonding, although an ensemble of such GO sheets will recover observable sample-averaged properties such as the overall ratio of oxygen to carbon content.
Atomistic-level detailed models of GO are therefore challenging to produce from experimental efforts alone, and computational approaches can provide complementary information. Despite this, currently there is no systematic procedure or algorithm for readily generating the highly complex, hierarchical, and defective atomic-scale structures of GO, despite excellent recent progress in this area [13][14][15][16]. Current shortcomings in understanding regarding the complexity and variability in atomic-scale structure of GO hinders advancement of new insights leading to structure/property relationships. To address this, here we report a hierarchical approach for readily producing complex atomistic structures of GO sheets and introduce a new algorithm, HierGO, for accomplishing this.
In terms of the construction of complex structural models of GO and their use in large-scale molecular dynamics (MD) simulations, the GO structure is typically simplified. Most commonly, the intact 2D bonding network of graphene is functionalized with oxygen-bearing groups as guided by variations of the Lerf-Klinowski (LK) model. In the LK model, both faces of the graphene plane feature epoxide and hydroxyl groups distributed randomly and somewhat uniformly across the basal surface, with carboxyl and carbonyl groups present at the sheet edges [17,18]. There are instances where the sheets were functionalized with hydroxyls only [19][20][21][22]; hydroxyl and epoxy only [23][24][25][26][27]; and hydroxyl, epoxy and carbonyl or carboxyl [13,14,[28][29][30]. Such structural models typically lack key features thought to be intrinsic to GO, namely a non-uniform spatial distribution of oxygen content and the presence of both holes and topological defects in the graphene lattice. Such features have been experimentally identified in GO [31][32][33][34][35], specifically transmission electron microscopy (TEM) of GO indicates regions of substantial heterogeneity, including point and topological defects, larger holes, and areas of up to >5 nm × 5 nm of pristine graphene, with sample heterogeneity dependent on preparation method [31][32][33][34][35].
To elaborate, the oxygen content and defects on GO can be highly dependent on the preparation method and process history [36,37], which is not reflected in the LK model. For example, GO can be annealed or chemically reduced to form reduced GO, or subjected to base treatment to form base-washed GO, which correspond to a lower or higher oxygen content than GO, respectively [38]. More specifically, TEM studies of GO have revealed a wide variety of defect sizes, defect densities, and proportions of pristine graphene regions relative to oxidized regions, depending on the annealing environment, time, and temperature [39,40]. Raman spectroscopic studies have measured the average lateral spacings between defects, indicating a range of distances [41,42]. Additionally, the presence of both holes and topological defects in GO sheets are indicated as key features for the aqueous GO membranes, since holes have been suggested to increase GO membrane permeability [43], and topological defects have been indicated as sites for oxygen retention postannealing [35]. In addition to holes, the presence of internal ruptures (tearing) in the basal plane of GO was recently confirmed using nuclear magnetic resonance (NMR) experiments [44]. How these defects affect the GO membrane structure in aqueous media remains unclear, and all are possible exploitable variables for the study of advanced GO materials.
Modeling studies of these effects are few in number. Sarvestani et al, recently reported simulations probing the effect of idealized nanopore shapes on ion and water transport through a single (rigid) graphene sheet [45]. Fakhraee and Akhavan reported similar simulations using oxidized nanoporous graphene membranes under both strained and non-strained membrane conditions [46]. However, explorations of these broader effects in GO sheets and membranes remains unpublished to date.
Regarding previous atomistic simulations that considered some form of heterogeneous sheet functionalization, most of these studies focused on heterogeneous oxidation on different flakes and/or creating patches of pristine graphene on GO sheets [13,14,19,26,28,[47][48][49][50][51], in which none of these studies modeled sheets that possessed holes and topological defects. Notably, Williams et al, investigated the structure of GO membranes in vacuum and in liquid water [13,14], and studied the permeation of potassium ions through these hydrated GO membranes [47]. In addition, Suter and Coveney [51] developed an algorithm [15] based on machinelearning to derive structural models that incorporated the two-phase (oxidized and un-oxidized) character of GO on the association of discrete GO flakes rather than modeling a membrane, Suter and Coveney [51] subsequently used this algorithm to derive structural models that incorporated this two-phase (oxidized and un-oxidized) character of GO. In contrast, a number of previous studies reported modeling of water and ion transport through an idealized GO nanochannel (typically bounded by two perfectly planar, fixed, parallel sheets) featuring spatial variations on oxygen content and pristine graphene [26,48,49,52,53]. Collectively, these simulations studies indicated that variation of the oxygen content of the sheets influenced transport of water-/ions through the GO structures. However, with the exception of the work of Williams et al and Coveney and co-workers, these studies modeled the membrane stacking by freezing or restraining the two GO sheets comprising the nanochannel, such that the inter-sheet separation was an idealized (and constant) value as a function of traversal across the channel.
The atomic scale structuring of each GO sheet in a membrane will impact on the next level of structural hierarchy, chiefly characterized by the vertical inter-layer spacing and the lateral inter-sheet gaps. From a modeling perspective, most previous studies of GO membrane structures have been based on idealized stacks, i.e. by vertically or laterally stacking two or more identical LK model GO sheets [19, 21-25, 27, 29, 30, 50-55]. This naturally results in a certain uniformity in inter-layer spacings (in terms of both transverse variation in this spacing, i.e. laterally across a given inter-sheet pairing, and vertical variation, i.e. between different inter-sheet pairs). In simulations of heterogeneous GO nanochannels, several previous studies explored more complex stacking geometries composed of GO flakes with different oxidation levels and pristine graphene regions [13, 14, 20, 26, 28, 47-49, 56, 57]. In all cases, the authors reported that heterogeneity of inter-layer spacing across stacked GO flakes was critical controlling the performance of GO membranes. However, we reiterate none of these studies considered the impact of holes, ruptures, and topological defects, all of which are anticipated to affect water and ion transport.
Simulations of ensembles of chemically-and structurally-diverse, large GO sheets that simultaneously accommodate oxygen heterogeneity, topological defects, tears, and holes have not yet been reported. This may be due in part to the fact that generating large highly heterogeneous GO sheets can be challenging and time consuming to accomplish using generic structure building packages. In terms of methods for automatically generating heterogeneous GO sheet structures that are amenable to molecular simulation, very little has been reported [15,16]. Furthermore, it is also noted that the current codes (detailed below) produce GO structures, but the transposition of these as-produced structures into 'simulation ready' input files is an additional non-trivial step that is left to the individual user. For example, Sinclair and Coveney developed code for generating heterogeneous GO structures based on machine learning approach [15], producing large GO sheets featuring oxidized and un-oxidized regions. Similarly, Muraru et al, developed code, referred to as GOPY, which can automatically generate structures of GO, reduced GO and other graphene derivatives. [16] In both codes, GO sheets were functionalized with oxygencontaining groups at the basal plane and carboxyls at the edges, but without holes, ruptures, and/or topological defects. There is currently no such code for the production such GO structures. Therefore, to model how these features affect water and ion transport in GO membranes, an automated approach to produce such GO structures is needed, because every sheet in the GO stack in the simulation can be structurally and compositionally different, and the labor required to generate the structure of each such GO sheet without recourse to such a code is impractically excessive.
To address this in the current work, a modular tiling concept is introduced to create hierarchical complexity in GO, along with a flexible, automated algorithm, denoted HierGO, and a computational workflow to generate large-scale highly heterogeneous GO sheet structures that incorporate spatially heterogeneous oxidative content, topological defects, ruptures, and holes. As a key practical feature, this workflow also transposes the atomic coordinates of these highly-complex GO sheet structures from a PDB format into 'simulation ready' input files, to enable the simulation community to make rapid progress in this area. This open-source code can produce structures of GO sheets with tuneable size; tuneable topological and hole defect size spatial distribution, density and locality; and tuneable oxidative group type, quantity and locality.
The outputs of the code were used to reveal significant differences in a range of characterization metrics for a simple stack of four GO sheets in liquid water made from three different types of sheets: (a) the typical "LK" type of sheet; (b) a "fresh" type of sheet, which corresponds to a freshly created GO sheet (not subjected to harsh post-processing), containing defects and a spatially-variable oxygen distribution; and (c) an 'aged' type of sheet, reproducing the result of harsh post-processing, resulting in highly ragged edges, holes, topological defects, and reduced oxygen content. The outcomes provide clear motivation for the need of this modular tiling strategy and computational workflow, for generating nonidealized models of GO membranes for future study of water and ion transport.

Overview
The strategy of the HierGO code is to automatically create a large GO sheet structure in protein databank (PDB) format by laterally assembling any number of smaller structural sub-units, referred to herein as tiles. The initial tile template is an approximately square fragment of idealized graphene of userspecified dimension. Each GO tile can then specified by oxidation level, proportions of the type of oxygen groups (epoxies, hydroxyls, and so on), spatial distribution of oxygen groups, spatial location of oxygen groups, concentration of carbon vacancies and hole size distribution, concentration of topological defects, and presence of ruptures (tears in the basal carbon plane). Figure 1 summarizes the concept and work-flow of the HierGO algorithm. Although square sheets are provided as examples, the algorithm can create a rectangular GO sheet of any aspect ratio. If desired, the HierGO suite of code can then automatically transpose these as-produced GO sheet PDB files into simulation software-compatible input files (structure, force-field, and topology files).
The utility of the concept is demonstrated via the creation of three different types of GO sheets of 10 nm × 10 nm dimension: 'LK' sheets, corresponding with a generic LK structure; 'fresh' sheets corresponding with freshly created GO that has not been subjected to harsh processing; and 'aged' sheets, corresponding with GO that has been exposed to extreme thermal and/or chemical processes. A simple four-sheet vertical stack is created for each of these three types of GO scenario to quantify and contrast their properties such as inter-layer spacing, levels of hydrogen-bonding, and water retention. In the LK stack, each sheet is chemically-, structurally-and orientationally-identical (reflecting typical conditions of many previously published studies), whereas the fresh and aged stacks comprise four chemically-and structurally-distinct GO sheets. MD simulations were used to equilibrate the structures of these stacks and the resulting trajectories were analyzed with respect to the properties mentioned above. For brevity, only a concise outline of each step in the process, the subsequent MD simulations, and their analyses are summarized herein; full details are provided in 'Computational Methodology' , supporting information.

Tile creation
The process starts with choice of the size of tile (the basic structural sub-unit), which is defined as an integer number of repeats of the 2D unit cell of pristine graphene in the lateral (x and y) dimensions. Next, two defect types can be incorporated: vacancy defects (holes) and/or topological defects.
Holes were generated through the random selection and deletion of an initial defect atomic site, followed by the random selection and deletion of a neighboring carbon of the initial defect site. Hole sizes were determined for a given tile size based on a two user-specified parameters; vacancy ratio and size distribution. The term 'size distribution' refers to the sizes of holes used in the geometry, for which there are four options: small, medium, large, and mixed. A set of holes can then be generated randomly to fulfill the nominated size distribution, and a random site on the lattice (which does not already contain a defect) was chosen to place the hole defect. Examples of holes created by the code are illustrated in figures S1(a)-(c), supporting information, and examples of the code pre-sets for hole size distributions is provided in figure S2, supporting information.
Topological defects (as identified from the existing literature, including recent work [35]) were generated from rotations of atoms in the lattice (examples provided in supporting information figures S1(d) and (e)). Unlike the vacancy defects, the topological defects included were limited to the two-atom (single C-C bond) rotation i.e. a Stone-Wales defect [58], supporting information figure S1(d), and a 24 atom rotation (referred to herein as a miniature grain boundary, supporting information figure S1(e)). For the placement of topological defects, a lattice site was randomly chosen, and for the single-bond rotation a neighboring site was randomly selected and both atoms subsequently rotated. For the 24 atom rotation, a site was randomly chosen and the central atom of a neighboring hexagon was used as the center of a nearest neighbor search for the nearest atoms within 1.8 Å, and the subsequent atoms were used in the rotation. Unlike for the vacancy defects, sites on the periodic boundary were excluded, due to the possibility of generating nonphysical geometries when tiles are subsequently connected together to form a sheet. Examples of incorporation of multiple topological defects are provided in figures S3 and S4, supporting information.
Following the optional steps of defect incorporation, oxygen content was added to the tiles in a similarly modular way, where the amount of each functional group can be set, with certain presets provided for convenience. The functional groups considered were chosen on the basis of data from a recent annealing simulation study [35]. The oxygen containing groups retained after annealing at 300 K-2500 K were di-hydroxyl, surface hydroxyl, edge hydroxyl, epoxy, ether, and carbonyl; thus these were the groups initially considered. The presence of carboxylic acid was also included on internal edges. In the HierGO code, the user can manually specify percentages for each of these functional groups, or use a pre-set ageing (processing) profile which will automatically allocate the amounts of oxygen as listed in table S1, supporting information. These temperatures (300 K also denoted as Fresh, 600 K, 900 K, 1200 K, 2500 K) approximately capture the number and the proportion of types of oxygen-bearing groups retained at these annealing temperatures as reported by Marsden et al [35]. Carboxylic acid was also included in this group, but only for addition to internal edges on the tiles. Installation sites can be selected randomly, but hydroxyls are prevented from neighboring one another, both to ensure the conservation of the dihydroxyl specification and to prevent hydrogen overlap (which can complicate the subsequent steps of the overall process). After all functional groups have been added, any remaining two-fold coordinated carbons are hydrogenated. As groups are placed, the number of oxygens above and below the sheet is tracked to ensure the same number of oxygens both above and below the sheet. Control via the HierGO code is also available to specify locations of oxygen content, for example to preferentially decorate hole edges (as per the findings reported by Marsden et al [35]), as illustrated in figure S5, supporting information. An additional option exists to create oxygen-decorated ruptures (tears) in the basal plane (shown in figure S6, supporting information). The ability to confine the added oxygen content to a pre-specified lateral radius around a given installation site (referred to in the code as 'bunching') is also available, illustrated in figure S7, supporting information.

Tile assembly for sheet creation
To create a heterogeneous GO sheet, a set of individual tiles with the desired vacancy amounts, topological defect content, and oxygen content must first be created. Once this set of tiles is available, any rectangular (including square) array of same-sized tiles can be combined using HierGO's automated tile-stitching code, which begins by moving the tiles adjacent to one another and expanding the periodic boundary conditions, resulting in the desired heterogeneous sheet, with local structuring on the tile scale. If an input oxygen percentage is specified, oxygen in the form of carbonyl and carboxylic acid groups are added to two-fold coordinated carbons located on the sheet edges. Once this threshold oxygen amount is reached, or alternatively if no oxygen amount is specified, the remaining two-fold coordinated carbon atoms are terminated with hydrogen to produce a fully terminated, heterogeneous sheet. The tile generation and tile assembly processes result in the production of a set of 3D atomic coordinates in PDB format for the resultant GO sheet.

Conversion into simulation-ready inputs
The next step in this process is the automated conversion of this PDB file, which may feature considerable chemical and topological complexity, into a set of input files corresponding to a specific fixedtopology force-field and for use in a specific simulation package. In this instance, the code was written to achieve conversion from PDB format into GROMACS [59] input files, using the OPLS-AA force-field [60], although extension of the code is possible to accommodate other simulation software and force-fields.
The as-produced GO sheets were initially visualized using visual molecular dynamics (VMD) [61] to check for any unusual bond connectivities. Incomplete system topologies (.top) containing atom, bond, angle and dihedral (proper and improper) information of the modeled GO structures were generated using ToPoToolS version 1.8 [62]. Atom types, charges and masses were then replaced with corresponding types, charges and masses described in the OPLS-AA force field [60]. Since the OPLS-AA force field uses the Ryckaert-Bellemans potentials in combination with 1-4 pair interactions for dihedral interactions in GROMACS [63], dihedral functions were changed from 1 to 3. Furthermore, 1-4 atom pairs obtained from the dihedral connectivity entities were also added to the topologies. The finalized topologies were saved as included topology (.itp) formats, and are provided in the supporting information.

MD simulation details
Classical MD simulations of aqueous GO sheets were performed using GROMACS version 2020 [59]. Collectively, three types of GO stacks were considered: LK, fresh, and aged sheets. All systems comprised four GO sheets of lateral dimensions ∼150 Å × ∼130 Å. These four sheets were initially separated by an inter-layer spacing of ∼8 Å. The stack of GO sheets was placed in a periodic cell of dimensions 180 Å × 180 Å × 90 Å, oriented in the x-y plane, such that the stacking direction was aligned with the z-axis. The fresh, aged, and LK GO sheets were solvated by 83 150, 85 598 and 82 562 water molecules, respectively. GO sheets and liquid water were described using the OPLS-AA force field [60] and the TIP4P water model [64], respectively. The combination of these force fields have been previously reported to reproduce conformational and hydration energetics of hydrocarbons consistent with electronic structure theory calculations [60], implemented in previous GO studies [22,[64][65][66][67][68]. Details of the OPLS atom types used for modeling the GO sheets are provided in table S2, supporting information. For all systems, to ensure ease of comparison across the three types of stack, a carbon atom located within the center of the bottom GO sheet was kept fixed, while the positions of remaining bottom sheet carbons were restrained to the x-y plane. Periodic boundary conditions were applied in all three dimensions. Snapshots of representative initial configuration of the various GO sheets is provided in figure S8, supporting information.
Production runs of 10 ns NVT were performed, with a target temperature of 300 K maintained using the Nose´-Hoover thermostat [69]. Equations of motion were integrated using the leapfrog algorithm [70], using a time step of 1 fs. Lennard-Jones non-bonded interactions were used, along with the particle mesh Ewald electrostatic summation [71], with a similar cutoff of 12 Å. Frames were saved every 1 ps. For all systems two independent NVT runs were performed, with different initial randomly-generated velocities. Geometry optimization of the topological defect examples (supporting information figures S5 and S6) was performed using the LAMMPS software package [72] using the reactive force field ReaxFF [73,74].
Trajectory analysis of GO stack metrics included quantification of inter-layer spacing along the z-axis stacking direction, water retention between the layers, and hydrogen-bonding between layers and with liquid water. The inter-layer spacings were determined using two approaches: time-averaged in the x-y coordinates as typically reported in previous studies, and, time-averaged and discretized over the x-y plane (of approximately 25 Å grid spacing) to highlight the lateral variations in the inter-layer spacings, producing a 2D color map of local inter-layer distances. Water retention analysis required careful attention; to define the upper and lower boundary of each layer gap, a closed surface was defined using z coordinates of all carbon atoms belonging to the upper and lower sheets, respectively. The surface was computed using the SciPy interpolate.griddata python function [75], using a spatial grid sampling every 1 Å and interpolating values between data points by taking the z value of the nearest carbon data point. This procedure creates two closed surfaces that capture local deformations of the GO sheets, and when coupled with the x and y boundaries as specified above, define an inter-layer gap region. A given water is therefore considered to be inter-layer if the position of its oxygen nucleus lies within the defined gap region.

Results and discussion
The overall concept of HierGO is an automated process by which the user can generate the 3D atomic coordinates of a complex GO sheet structure and convert this into a simulation-ready input files, by first creating a set (one or more) tile structures. Tiles are defined here to be the basic structural sub-unit. These tiles can then be connected together in different spatial arrangements to form a chemically and structurally heterogeneous GO sheet which is outputted in PDB format of atomic coordinates. In creating each tile, the user can specify the oxygen content and exert control over the spatial arrangement of these oxygenbearing groups, such as the 'bunching' option, as described in the Methods, the density of hole defects and their size distribution, and the density of topological defects. This allows the user to create a set of tiles with huge variation in individual characteristics, but when connected together in a sheet configuration, can conform to specified ensemble-averaged traits, such as the average oxygen content.
The user first creates a tile by specifying the tile size, a percent carbon vacancy, and hole size distribution ('small,' 'medium,' 'large' or 'mixed') to produce a tile such as in figure 1(a) (10% vacancies; mixed hole sizes). The user can then also include topological defects by specifying the percentage of Stone-Wales defects and/or 24-atom rotation defects, to produce a tile such as in figures 1(b) and (c). Following specification of defects, the user can then choose oxygen decoration parameters, including 'fresh' , where a hydroxyl:epoxy ratio is specified, or 'aged' , where options 1-4 correspond to an increasing in annealing intensity as described in the Methods. Besides illustrating how oxygen and hydrogen can be added to tiles with different defect contents, figures 1(d)-(g) provide examples of the different possible decoration options: in figure 1(d) a fresh tile is decorated using the 'bunching' option, in figure 1(e) a fresh tile is decorated with all groups randomly placed, in figure 1(f) an aged tile is decorated with oxygen preferentially decorating hole/sheet edges, and in figure 1(g) an aged tile where carbonyl, carboxylic acid and nonepoxide ether groups are decorated at defect edges, but hydroxyl and epoxy groups are added across the basal plane. It is noted here that GO sheet hydrophilicity is, to first order, expected to correlate with both the overall oxygen content of the sheet and the spatial distribution of those oxygen atoms (i.e. due to bunching or spatial depletion of oxygens). Over a sheet of sufficient size, the definition of GO sheet hydrophilicity at this level of complexity therefore becomes a spatially-localized concept.
Once the desired tiles have been created, any number of these can be connected via their edges to form heterogeneous sheets, either a square array, or with an array aspect dimension specified by the user. To exemplify the utility of the modular tiling approach, and to investigate the dynamics of more complex GO stack models in the presence of liquid water, two new types of GO sheet model were employed, 'fresh' and 'aged' (as explained in the Methods) and contrasted against a typical conventional LK-type GO sheet model, summarized in figure 2. In the LK model, figures 2(a) and (b), a 147.6 × 127.8 Å sheet was functionalized with a spatially randomized distribution of hydroxyls and epoxy groups, with an overall hydroxyl to epoxy ratio of 1:1, with no holes or topological defects. In the fresh (example in figures 2(c) and (d)) and aged (example in figures 2(e) and (f)) models, a total of nine tiles (49.400 × 42.782 Å) was created and connected in different arrangements in a 3 × 3 square array. Oxidation in both the aged and fresh GO models was based on experimental and simulation results from previous simulations [35]. None of the GO models predict the oxygen distribution of any experimental GO structure, rather the models explore possible structures within available experimental data.
To generate the GO stack, four GO such sheets were vertically assembled in a face-to-face arrangement. In the LK case, the same GO sheet (figures 2(a) and (b)) was used in each layer, reflective of typical GO simulations reported previously. In contrast, for the fresh and aged stacks, four different GO sheet structures, each with the same overall specification of oxygen content and defect content, were stitched together from different spatial permutations of their respective nine-tile basis set. The structures of the four sheets and their location in the stack is provided in figure S8, supporting information.
In the fresh case, vacancy content for the tiles varied from 0% to 20%, whereas defect sizes were selected from the 'small-medium' range and oxygen content varied from 0% to 60%, with a sheet-averaged oxygen content of 30%. Only hydroxyl and epoxy groups were added, in a ratio of 1:1, and no distinction was made between defect edges and basal carbon when adding the functional groups. For the aged case, vacancy content for the tiles varied from 0% to 30%, defect sizes were from selected from the 'small-mixed' distribution, the proportion of topological defect atoms ranged from 0% to 30% (Stone-Wales only) and oxygen content varied from 0% to 30%. The functional groups added reflected the 'High' aging profile as outlined in table S1, supporting information, with functional groups being preferentially added to defect edges.
In each of the three cases (LK, fresh, and aged) the four sheets were placed with initial inter-layer spacings of ∼8 Å in liquid water and subjected to standard MD simulations (details in section 2). Representative snapshots of the three types of GO stack and a summary of the various structural features are provided in figure 3. The three stacks featured common but key differences, in terms of structural features. Specifically, all GO sheets contained both epoxy and hydroxyl groups. However, the aged sheets featured carbonyls, carboxylic acid and ethers, in addition to epoxy and hydroxyl groups. The LK and fresh sheets had equivalent oxygen content (30%) with similar percentage of epoxies (∼50%) and hydroxyls (∼50%), whereas the aged sheets had lower average oxygen content (15%) with a different proportion of functional groups, as detailed in figure 3. Both fresh and aged sheets also contained topological and vacancy defects of different sizes. Overall, based on the structural features and the stacking of the GO sheets, the trend in heterogeneity of the GO systems was suggested to be of the order LK < fresh < aged. Our multilayered GO stack models in aqueous water captured a broad range of experimentally reported characteristics, significant for predicting and understanding realistic GO membrane properties. Figures 4(a) and (b) provide the time-averaged, laterally-averaged inter-layer spacings (along the stacking direction, z) and the number of waters within the respective gaps of the multilayered GO sheets as function of simulation time for the replica 1 simulations (two replica simulations were performed for each case). Corresponding analyses for the replica 2 simulations are provided in figures S9 and S12, supporting information. Consistent with previous simulations of GO membranes in liquid water [13,14], these data indicate that different structural features of GO sheet, including oxygen contents, types and  distributions, vacancy and topological defects can influence the inter-layer vertical spacing and number of waters within the GO membranes. In terms of the laterally-averaged inter-sheet distances for the LK, fresh and aged GO stacks ( figure 4(a)), our data suggest minor differences between the LK and the fresh systems. However, significant differences in the laterally-averaged inter-sheet spacings were found between the aged and both the LK and fresh GO sheets.
For both the LK and fresh sheets, the laterallyaveraged inter-layer spacings within the various gaps as function of time was almost constant throughout the simulation. However, for the fresh sheets, the spacing within gap 3 decreased linearly as a function of time but remained constant after ∼6 ns. In addition, the laterally-averaged inter-layer distances of the respective gaps for LK and fresh sheets were broadly comparable (7.1-7.5 Å) from time >6 ns. These outcomes were found for both replica runs and can be attributed to the similar oxygen/carbon ratio of both the LK and fresh sheets.
For the aged GO sheets, the laterally-averaged inter-layer spacings decreased as a function of simulation time (until after 8 ns). Compared to the LK and fresh stacks, this drastic variation in the laterallyaveraged inter-layer spacing can be accounted for by several structural features, including low oxygen content, the presence of topological vacancies and defects. To elaborate, we suggest that; (a) low oxygen content supports a lower degree of hydrogen bonding interaction between water and GO sheets, (b) water molecules initially present in the inter-layer gaps diffuse through the holes, and (c) topological defects increase the dynamism of GO sheets, as evidenced by the root mean square deviation (RMSD) of the sheet movement, provided in figure S10, supporting information. Further, in comparing the laterallyaveraged inter-layer spacing for the respective gaps in the multilayered aged sheets, replica 1 showed distinct distances ( figure 4(a)), whereas replica 2 supported a similar distance for gaps 2 and 3 ( figure  S9(a) supporting information). These findings also suggest a high degree of dynamism of the aged GO sheets in liquid water, which is supported by the localized RMSD of the GO sheets (figure S11 supporting information).
Despite the similarity in the laterally-averaged inter-layer distances for the LK and fresh GO stacks, figure 4(b) shows that the number of waters between the gaps (i.e. 1, 2 and 3) were significantly different. For the LK sheets, the number of waters in each gap was comparable, corresponding directly to the trend in the laterally-averaged inter-layer spacing. On the contrary, the number of waters in each gap in the fresh GO stack was clearly different. This indicates that the level of hydration within the gaps cannot be predicted a priori based on the laterally-averaged inter-layer spacings, since the dynamics of water within the LK and fresh GO layers might be different. This outcome may be linked to the co-existence of different regions of spatially localized inter-layer distances within the respective gaps, which mediates different localized hydration levels. To support this claim, we further analyzed the local inter-layer spacing for the various gaps between the modeled sheets.
In the case of the aged GO stack, the number of waters within the gaps decreased as the simulation progressed, similar to the laterally-averaged inter-layer spacings. This indicated that layer spacings decreased as water diffused out of the gaps. After 8 ns, the number of waters within the gaps between the aged sheets was lower compared to both the LK and fresh stacks, regardless of the fact that the aged system had a greater number of waters initially present than the fresh and LK systems (2448 and 3036 more water molecules, respectively). This indicates that water molecules do not prefer to accumulate in the membranes of aged GOs, which may be due to low number of hydrogen bond interactions with the sheets (figure 4(c)).
The localized inter-layer spacings of the respective gaps for both fresh and aged stacks showed significant variations in gap 1, 2 and 3 (both amongst and between each gap), figure 5. In contrast, the localized inter-layer spacings of the gaps in the LK stack were almost completely homogeneous. These findings clearly support the fact that the multilayered LK sheets comprised an idealized stacking geometry, whereas the fresh and aged stacks supported the formation of more complex and variable stacking geometries, as also summarized in figure S8, supporting information. In addition, it is emphasized that the heterogeneity in the localized inter-layer spacings may have induce heterogeneity in the localized distribution of water in a given gap. This can ultimately increase or decrease the number of water within the various gaps, irrespective of the similarity in the laterally-averaged inter-layer spacings. Overall, the data indicate that the structural heterogeneity of the fresh and aged GO sheets and their complex stacking geometries are significant factors influencing the different levels of variation in the local inter-layer distances and hydration across the respective gaps. These variations can have significant implications for the transport pathways of ions and water through GO membranes.
To evaluate the effect of the structural characteristics of GO sheets on interfacial interactions, hydrogen bonds between sheets and water, and involving only sheets (inter-sheet) were quantified. The total number of sheet-water and sheet-sheet hydrogen bonds as a function of time for all three systems is provided in figures 4(b) and (c) (corresponding analyses for run 2 are provided in figure S12, supporting information). The ordering of the three GO systems in terms of sheet-water interactions was LK > fresh > aged, whereas that of inter-sheet interactions showed the opposite behavior.
Specifically, for interactions between water and the respective sheets, the LK sheets supported the highest number of hydrogen bonds, whereas the aged sheets featured the least number of hydrogen bonds ( figure 4(c)). On the contrary, the aged sheets supported the highest number of inter-sheet hydrogen bonds, whereas the LK system revealed a drastic reduction compared to both the aged and fresh sheets ( figure 4(d)). These findings can be accounted for by the variations in the 2D local inter-layer spacing (figure 5). As previously discussed, the LK model showed an almost homogeneous localized inter-layer spacing between the respective gaps. This results in uniform local hydration levels within the LK gaps, ultimately leading to uniform localized sheet-water interactions and less contact between the sheets in the stack.
For the fresh and the aged sheets, heterogeneity of the localized inter-layer spacing means that different regions can have either lower or higher amounts of accumulated water, depending on the size of the local inter-layer gap. In addition, due to the heterogeneous distribution of the oxygen groups, a given localized region with a wide inter-layer spacing and high accumulated water does not necessarily translate into a region with a high degree of hydrogen bond interactions. This is because the region may consist of pristine graphene, vacancies and/or low oxygen content. On the other hand, a localized region featuring a high oxygen content can have a narrow interlayer spacing and low accumulated water. Thus, for the aged sheets, the presence of more regions with relatively smaller inter-layer distances may correspond to more regions with relatively low accumulated water, resulting in enhanced inter-sheet contact and a reduced degree of water-sheet interactions. These data indicate that the formation of hydrogen bonds is a significant factor in the retention of water in the gaps. Overall, our simulations indicate that the structural characteristics of GO sheets can substantially affect their inter-sheet interactions, and therefore the pathway tortuosities within the stack which may be the critical factor for water and ion transport in GO membranes.
To probe the degree of water retention in the respective gaps between the GO sheets, the residence time of the individual water molecules within the gaps was quantified. Here, this is defined as the accumulated time spent between a specific gap (between a specified pair of GO sheets) by a water molecule, regardless of whether the molecule left (and/or reentered) the gap in the course of the simulation. The breakdown of the average total number of waters with high residence time, defined as >9 ns, in the various gaps is summarized in figure 6. For the three systems, these data reveal a decreasing trend in water retention, followed LK > fresh > aged.
Despite the fact that both LK and fresh GO sheets have a similar chemical composition, the LK stack supported a greater number of high residence waters in the stack compared with the fresh system. In contrast, the aged sheets featured a drastic reduction in the number of high residence waters, compared to both the LK and fresh sheets. The findings suggest that the number of high residence waters is dependent on the hydrogen bonding interactions between water molecules in the gap and the GO sheets. This indicates that for a given localized inter-layer spacing, a higher number of accessible oxygen groups would lead to a greater degree of water-sheet hydrogen bond interactions, resulting in waters residing in the gap for a longer time. However, due to the heterogeneity of the localized inter-layer spacings (in the case of the fresh and aged stacks), oxygen groups located at regions with narrow gap spacings (∼3.8 Å-5.0 Å) may not be accessible to water. As such, more water would be accommodated at localized regions of wider interlayer spacings (∼7 Å-10 Å). This means that upon saturation of the available oxygen sites, the remaining weakly-bound waters can easily diffuse out of the gaps. In addition, the presence of holes in the fresh and aged sheets can serve as channels for these waters to leave the gaps. Consistent with previous studies [13,27,47,76], these data underscore that water dynamics in GO membranes can be influenced by the structural characteristics of the GO sheets.
Overall, the three distinct GO stacks studied in this work reveal how the introduction of substantial structural complexity and variety can have a significant impact on the resultant properties of the stack. The findings also reveal that global characterization metrics that have been typically reported in previous studies, such as the laterally-averaged inter-sheet spacing, can be misleading with respect to establishing structure/property relationships in GO. The ability to create complex GO structures, along with the new analyses presented here have critical implications for modeling, mapping, and manipulating transport pathways in GO membranes for water and ions. Key differences in the dynamical transport properties for the passage of water and ions for these three types of system will be explored in future for more complex stacking arrangements. Furthermore, the same structural GO complexities are key aspects in e.g. supercapacitor electrode applications and can now be explored using the hierarchical structural models produced by HierGO.

Conclusions
In summary, the HierGO suite of code, based on a modular tiling concept, is a user-friendly approach to create simulation-ready structural models of heterogeneous GO with significant hierarchical complexity. The structural characteristics of the GO structures, including: oxygen contents, functional group types, and spatial locations; hole and topological defects; as well as the size, can be tailored to the specifications of the user. The code can automatically convert the resultant PDB structure files into simulation-ready GROMACS topology and input. Example simulations of multi-layered GO stacks showcase how the current approach was capable of readily generating GO models with non-correlated oxygen distribution and structural heterogeneity, and vividly illustrate how incorporation of this complexity and variety into the stack model are critical key features of the GO stack, such as the variations in localized inter-layer spacings, the degree of water-sheet and inter-sheet hydrogen bonding, and the retention of water between the layers. Properties such as these are critical for conferring potential transport pathways both parallel and perpendicular to the sheet stacking direction. The structure and properties of the generic LK model were shown to be over-simplistic relative to the new structural models introduced here, indicating that use of the LK model leads to a biased evaluation of the performance of GO membranes in various applications relevant to the transport of water, ions, and molecular agent.

Data availability statement
The data that support the findings of this study are openly available at the following URL/DOI: https://github.com/IFM-molecular-simulationgroup/HierGO.

Code availability
The HierGO suite of code is open-source and available free of charge from our GitHub repository: https://github.com/IFM-molecular-simulationgroup/HierGO. This includes a comprehensive README file that explains all aspects of the code, along with a worked example for creation of an 'aged' GO sheet.

Acknowledgments
This work was supported by Australian Research Council (ARC) Discovery Grant DP190103273. T R W acknowledges the National Computing Infrastructure (NCI) Canberra, and the Pawsey Supercomputing Centre, supercomputing resource under NCMAS. Additional computational resources using the LIEF HPC-GPGPU Facility hosted at the University of Melbourne are gratefully acknowledged. This Facility was established with the assistance of ARC LIEF Grant LE170100200.

Funding sources
Australian Research Council Grants DP190103273 and LE170100200.

Author contributions
The manuscript was written through contributions of all authors. All authors have given approval to the final version of the manuscript.