Generating artificial displacement data of cracked specimen using physics-guided adversarial networks

Digital image correlation (DIC) has become a valuable tool to monitor and evaluate mechanical experiments of cracked specimen, but the automatic detection of cracks is often difficult due to inherent noise and artefacts. Machine learning models have been extremely successful in detecting crack paths and crack tips using DIC-measured, interpolated full-field displacements as input to a convolution-based segmentation model. Still, big data is needed to train such models. However, scientific data is often scarce as experiments are expensive and time-consuming. In this work, we present a method to directly generate large amounts of artificial displacement data of cracked specimen resembling real interpolated DIC displacements. The approach is based on generative adversarial networks (GANs). During training, the discriminator receives physical domain knowledge in the form of the derived von Mises equivalent strain. We show that this physics-guided approach leads to improved results in terms of visual quality of samples, sliced Wasserstein distance, and geometry score when compared to a classical unguided GAN approach.


Introduction
Fatigue crack growth (FCG) experiments are of significant importance to determine the lifetime and damage tolerance of critical engineering structures and components that are subjected to non-constant loads [1].In recent years, digital image correlation (DIC) has been used to accompany and evaluate such mechanical experiments [2].The DIC data serves as the basis for subsequent mechanical evaluation of fracture mechanical quantities like the stress intensity factors [3] and J-integral [4].For this evaluation, the spatial location of the crack and especially the exact crack tip position is crucial.However, DIC data is subject to inherent noise and artefacts due to influences such as pattern quality, sensor noise, air movement, etc. [5] which makes this information difficult to obtain.To improve upon these issues, extensive work was done optimizing the pattern quality [6] as well as the DIC algorithm in order to obtain reliable measurements in case of inferior patterns [7] or under special external conditions [8].In the context of fracture mechanics, convolutional neural networks have been successfully applied to solve the crack detection problem fully automatically [9,10].These networks can deal with DIC noise and take the interpolated DIC-measured displacement fields as input to predict the crack paths and tips.However, for these powerful data-driven models to work reliably, they need a diverse set of training data.At the same time, experimental training data is scarce, since experiments are expensive, and manual labelling is extremely tedious and time-consuming.To address this issue, Strohmann et al. [9] added artificial training data in the form of finite element (FE) simulations.Nevertheless, simulations are idealized and lack the characteristic DIC noise.
Classically, synthetic DIC data can be generated by first creating an artificial speckle pattern on a digital image of the desired specimen.Then, an FE simulation is used to virtually deform the image.Finally, the deformed (speckle) image can be evaluated using a DIC algorithm (see, e.g., [11] and more recently [12]).This synthetic DIC data can for instance be used to assess systematic errors arising from different DIC techniques [11] or to understand calibration uncertainty [13].In contrast to these approaches, our goal is to bypass this multi-step procedure altogether and directly generate large amounts of artificial interpolated DIC-like training data in a fast, simple, data-centric manner using machine-learned generative models.For instance, this interpolated DIC data can be used to improve the training of automatic crack detection models (cf.[9,10]) by delivering an advanced data augmentation method.
In the field of data-driven modeling, generative adversarial networks (GANs) have proven to be powerful data generators.GANs are a generative, unsupervised approach to machine learning based on deep neural networks trained using an adversarial process.Deep convolutional GANs (DC-GANs) produced state-ofthe-art results in many generative and translative computer vision tasks such as image generation and style transfer [14][15][16].However, training GANs typically requires large amounts of data, which are often not available in the scientific domain.For example, it is not possible to mechanically test an entire aircraft to generate training data.But without sufficient data, deep learning models are often unreliable and poorly generalize to domains not covered by the training data.To overcome this problem, efforts have been made to integrate fundamental physical laws and domain knowledge into machine learning models.Karpatne et al. [17] describe theory-guided data science as an emerging paradigm aiming to use scientific knowledge to improve the effectiveness of data science models.They presented several approaches for integrating domain knowledge in data science.Daw et al. [18] proposed a physics-guided neural network (PGNN) by adding a physics-based term to the loss function of a neural network to encourage physically consistent results.Karniadakis et al. [19] coined the term physics-informed neural networks (PINNs) -a deep learning framework which enables the combination of data-driven and mathematical models described by partial differential equations.Yang et al. [20] combined this approach with GANs by adding a physics-based loss to the generator and, recently, Daw et al. [21] introduced a physics-informed discriminator GAN, which physically supervises the discriminator instead.
In this work, we generate artificial displacement data of cracked specimen using GANs.Our framework is based on the classical DC-GAN architecture.We incorporate mechanical knowledge by using a physicsguided discriminator.In addition to the generated displacement data, this discriminator receives the equivalent strain according to von Mises [22] derived from the real or fake (generated) displacements as an additional input feature, see Section 2 and Figure 1.This mechanically motivated physical feature guides the adversarial training process and leads to physically consistent generated data.The generated data samples can be used to increase data variation of given training datasets consisting of interpolated DIC displacements.Although this synthetic data is not labelled, it has the potential to improve supervised machine learning tasks, e.g. by using unsupervised pre-training [23] or label propagation [24].In general, our method can be applied to generate interpolated DIC-like displacement fields of cracked specimen.In this paper, we focus on FCG experiments of a middle tension specimen manufactured from an aluminiumbased alloy (see Section 2.2) and train several GANs.To demonstrate the merits of the physics-guided method, we compare the results of the physics-guided GAN to a classical GAN approach in terms of visual quality of generated samples (see Section 3.2) and distance of fake data distributions to the real training data.The latter is quantified using the sliced Wasserstein distance (SWD) [25] and the geometry score (GS) [26] (see Section 2.4).We show that the physics-guided approach accelerates the training and leads to physically more consistent results.

Methodology
Generative Adversarial Networks (GANs) are generative machine learning models learned using an adversarial training process [27].In this framework, two neural networks -the generator G and the discriminator D -contest against each other in a zero-sum game.Given a training dataset characterized by a distribution p data , the generator aims to produce new data following p data while the task of the discriminator is to distinguish generated data samples from actual training data samples.
Given a noise vector z sampled from a prior, e.g. the standard normal distribution, the generator outputs data samples G(z), called fake samples, trying to follow the training data distribution p data .Given a real or fake sample, the discriminator is supposed to decide whether it is real or fake by predicting the probability of it belonging to the training dataset.
Both models G and D are trained simultaneously contesting against each other in a two-player zero-sum minimax game with the value function V (G, D): This means D is trained to minimize the discriminator loss whereas G is trained to minimize the generator loss As the discriminator gets better at identifying fake samples G(z), the generator has to improve on generating samples which are more similar to the real training samples x ∼ p data .We refer to [27] for further details of the training algorithm.

Digital image correlation
Digital image correlation (DIC) is a contact-less, optical method to obtain full-field displacements and strains.It is widely applied in science and engineering to quantify deformation processes.In experimental mechanics, it is used to monitor and evaluate fatigue crack growth (FCG) experiments [28] by determining fracture mechanical parameters like stress intensity factors (SIFs) [3] or the J-integral [4].Essentially, DIC measurements are based on the comparison of a current image with a reference image using tracking and image registration techniques.The cross correlation method requires a random speckle pattern on the sample surface.Various external and internal influences such as illumination, air movement, vibrations, facet size and spacing, camera settings, sensor noise, pattern quality, etc. lead to inherent noise in the DIC data.Our goal is to generate artificial interpolated DIC-like displacement data using GANs.Since this data incorporates characteristic DIC noise, it can subsequently be used to improve the training of machine learning models such as crack detection [9,10].

Training data
To create a dataset for the training of our GANs, we use planar displacement fields u = (ux, uy) obtained during FCG experiments of the aluminium alloy AA2024-T3 using a commercial GOM Aramis 12M 3D-DIC system.Details on the general experimental setup can be found in [9].For the dataset, we use one FCG experiment performed on a middle tension (MT) specimen (width w = 160 mm, thickness t = 2 mm).While the specimen is clamped at the bottom, a load is applied from the top with a maximal force of Fmax = 15 kN (corresponding to σmax = 47 MPa) and ratio R = 0.3 with 20 load cycles per second.Every 0.5 mm of crack growth (measured by direct current potential drop), 3 images (at minimal load Fmin = R • Fmax, mean load Fmin + (Fmax − Fmin)/2 and maximum load Fmax) were acquired.From the resulting DIC dataset, we take the planar displacements ux and uy of the specimen and linearly interpolate them from an area of 70 × 70 mm 2 of the right-hand side of the specimen on an equidistant 256 × 256 pixel grid.This procedure results in 838 data samples of shape 2 × 256 × 256, where the first dimension stands for the x-and y-displacements.Each of the two channels is normalized to [−1, 1] by the min-max-scaling and shift such that the minimum and maximum are mapped to −1 and 1, respectively.

Physics-guided GAN
We aim to generate artificial interpolated DIC displacement data using deep convolutional GANs.For this, we mainly follow the architectural guidelines from [29].However, in order to reduce checkerboard artefacts, we choose nearest-neighbor upsampling instead of transposed convolutions in the generator [30].
We remark that GANs cannot be expected to generalize beyond the training data.The reason for this is that the generator learns to produce fake samples that approximate the distribution of the training data.
To cover a different type of experiment or material, the training data set must be extended.First, the random vector z passes a fully-connected layer with 8•8•512 = 32768 neurons, batch-normalization [31], and Rectified Linear Unit (ReLU) activation [32].The output of this layer is then reshaped into 512 features of size 8 × 8.After that, these features are successively doubled in size using the base block (upsampling → batch normalization → ReLU → convolution) four times.The final block ends with a tanh activation instead of ReLU.Therefore, in accordance with the training data, the generator outputs fake samples with pixel values between −1 and 1.
Discriminator.For the discriminator, we implemented the following two approaches: 1. Classical: The discriminator gets real and fake pairs of interpolated x-and y-displacement fields (ux, uy) and predicts a (pseudo-)probability of the sample being real.We refer to this approach as classical GAN.
2. Physics-guided: In addition to the interpolated displacement fields, the corresponding von Mises equivalent strain εvm is calculated based on the generated and real interpolated displacement fields and the discriminator gets the triple (ux, uy, εvm) as input in order to decide whether it is fake or real.For small strains, the von Mises equivalent strain is defined as the scalar quantity denotes the deviatoric part of the three-dimensional strain tensor In case of plane stress, εxz = εyz = 0 and εzz = −ν(εxx − εyy).Assuming volume constancy with a Poisson ratio of ν = 1/2, Formula (5) simplifies to We use Formula ( 7) for the physics-guided discriminator.Therefore, we numerically approximate the strains using finite differences, e.g.
To guarantee differentiability, the square-root function is smoothed by using √ • + δ with δ ≪ 1.We refer to this approach as physics-guided GAN (see Figure 1).The discriminator, which drives the training of the generator, has additional physical information, namely the equivalent strain, that the generator can only influence indirectly by producing physically consistent displacement fields.Certainly, other quantities, which can be derived from the displacement fields, such as strains εxx, εxy, or εyy can be used to guide the discriminator.However, we decide on using the equivalent strains, since the crack path and crack tip field is well-visible in them.
In both cases, we choose the same model architecture for the discriminator.The input of size 2 × 256 × 256, or 3 × 256 × 256 in case of the physics-guided discriminator, is successively downsampled to the size 1 × 32 × 32 using three blocks of strided convolutions, batch normalization, and LeakyReLU activation [33], where LeakyReLU(t) = max(αt, t) with α = 0.2.The extracted features are then flattened and pass the last fully-connected layer with one output neuron and sigmoid activation.The output is a number between 0 and 1 and is interpreted as the probability of the sample being real.

Evaluation of GANs
In classical supervised learning, a model is trained by minimizing a specific loss (e.g.mean squared error), which quantitatively compares model predictions with the expected target.After training, models can be evaluated and compared by calculating the loss (and accuracy) for independent labeled test data.GAN generators, however, are trained in an adversarial fashion using a second model (the discriminator) to classify generated data as real or fake.Both models are trained simultaneously to maintain an equilibrium.Therefore, there is no natural objective measure to evaluate GANs, quantitatively.Instead they are evaluated by assessing the quality and variation of generated data.This is typically achieved by visual inspection of generated samples or by calculating the inception score (IS) [34] and Fréchet inception distance (FID) [35].However, in case of interpolated DIC data, several domain experts would be needed to objectively grade the visual quality of generated samples.Moreover, quantitative metrics like IS or FID can only be employed for natural images since they use image classification networks like Inception [36], which are pre-trained on ImageNet [37].Therefore, in addition to a visual examination of generated samples in Section 3.2, we use metrics which are independent of the data type and do not use any pretrained models.More precisely, we use the following two metrics: Sliced Wasserstein distance.In mathematics, the Wasserstein distance is a natural distance function between two distributions.Intuitively, it can be viewed as the minimal cost of transforming one of the distributions into the other.In case of image-like datasets X = {Xn}n=1,...N and Y = {Yn}n=1,...N with same number of samples N and image sizes c × h × w, where c is the number of channels and h and w denote the height and width of images, respectively, the (quadratic) Wasserstein distance is given by, where the minimum is taken over all permutations π of the set {1, . . .N } [25].Due to the high dimensionality of images and the large number of samples, the exact computation of the Wasserstein distance is computationally infeasible.This is because the number of permutations scales exponentially with the number of samples N .Therefore, instead of (9), we use the sliced Wasserstein distance (SWD) introduced in [25] as an approximation, which is amendable for efficient numerical computation.The main idea of slicing is to map the high dimensional image data from R c×h×w onto one-dimensional slices.On these slices, the Wasserstein distance can be calculated in loglinear time by using the ordered structure of one-dimensional Euclidean space.The sliced Wasserstein distance is defined as, where Ω = {θ ∈ R c×h×w : ∥θ∥ = 1} denotes the unit sphere.We refer to [14,25] and Section 3.3 below for further details.
Geometry score.Introduced by Khrulkov & Oseledets [26], the geometry score (GS) allows to quantify the performance of GANs trained on datasets of arbitrary nature.It measures the similarity between the real dataset X real and a generated one X fake by comparing topological properties of the underlying low-dimensional manifolds [38].The detailed quantitative characterization of the underlying manifold of a given dataset X is usually very hard.The core idea of [26] is to choose random subsets L ⊂ X called landmarks and to build a family of simplicial complexes, parametrized by a non-negative, time-like persistance parameter α.For small α, the complexes consist of a disjoint union of points.Increasing α adds more and more simplicies finally leading to one single connected blob.For each value of α, topological properties of the corresponding simplicial complex, namely the number of one-dimensional holes in terms of homology, β1(α), are calculated (see, e.g., [39]).From this, the authors propose to compute Relative Living Times (RLTs) for every number of holes that was observed [26].For each non-negative number i, the RLT is the amount of the time when exactly i holes were present relative to the overall time αmax after which everything is connected.More precisely, where µ denotes the standard Lebesgue measure.Since the RLTs depend on the choice of landmarks L, we choose a collection of n random sets of landmarks Lj and define the Mean Relative Living Times (MRLTs) as The MRLT is a discrete probability distribution over the non-negative integers.It can be interpreted as the probability of the manifold having exactly i one-dimensional holes (on average).The L 2 -distance between the MRLT distributions of X real and X fake defines a measure of topological similarity between the real dataset and the generated one, called geometry score (GS): where imax is an upper bound on the number of holes.We refer to [26] for further theoretical details and to Section 3.4 for the choice of hyperparameters and results in our case.

Results and discussion
In order to demonstrate the effectiveness of the method and to compare the classical with the physicsguided discriminator approach, we trained 10 randomly initialized classical and physics-guided GANs each for 100 epochs.Moreover, we trained two classical and physics-guided GANs each for 1000 epochs in order to compare both architectures after long training runs.The training setup is described in Section 3.1 below.The trained models are evaluated qualitatively and quantitatively by using the following criteria: • Visual inspection of randomly generated samples (Section 3.2) • Sliced Wasserstein distances (Section 3.3) • Geometry scores (Section 3.4) A summary of the results can be seen in Table 1.In short, the physics-guided GAN approach leads to visually better results after 100 epochs and overall to measurably better results.For a detailed discussion, we refer to the sections below.

Model
Epoch Visual quality GS ×10

Training procedure
Before training, the filters of the convolution layers in both generator and discriminator network are initialised randomly from a normal distribution with zero mean and a standard deviation 0.02.In contrast, the weights of the batch normalization layers are initialised from mean 1 and standard deviation of 0.02, whereas the biases are initialised with zeros.
For training, we choose the Adam optimizer [40] with a learning rate of 0.002, momentum parameters of β1 = 0.5, β2 = 0.999, and a batch size of 8. We noticed that occasionally models suffer from mode collapse during training.This means that the generator always outputs the same or visibly similar fake data samples and stops to learn.This problem is well-known and still part of active research.Popular strategies to overcome convergence issues of GANs regularize or perturb the discriminator [41,42] or by using a more sophisticated loss function [43].In our case, if mode collapse happened, we restarted the training and discarded the collapsed model.All neural networks and training loops were implemented using PyTorch [44].The hardware for the training was an NVIDIA RTX8000 graphics card.

Visual evaluation
We begin with a visual inspection of the generated data and compare real training data to representative samples generated by the classical GAN and the physics-guided GAN.We refer to fake samples generated by the classical or physics-guided GAN generators as classical or physics-guided GAN samples, respectively.
Figure 2 shows real interpolated DIC data samples obtained during FCG experiments as described in Section 2.2.The figure contains planar displacements and von Mises equivalent strains of 9 randomly selected data samples.Images belong together in the sense that the x-displacement of the first sample is located at the top left of the left column.The corresponding y-displacement is located at the same position in the middle column, and the corresponding calculated equivalent strain is located at the same position in the right column.Here, the crack path as well as the characteristic crack tip field is clearly visible.
In Figure 3, we see random fake samples after 100 epochs of GAN training.We can often identify the initial crack on the left edge and the crack path.Whereas most generated displacements are visually close to real displacements, significant differences are revealed in the von Mises strains, which are calculated afterwards.Especially classical GAN samples contain inconsistencies between x− and y− displacements and visual artefacts.This leads to large-scale vortexes and small-scale noise in the von Mises strains 1 ○.Although far from being perfect, physics-guided GAN samples contain significantly less of these artefacts and inconsistencies and visually capture the inherent noise of the DIC system much better than classical GAN samples.Nevertheless, most fake samples are still visually distinguishable from real samples.In order to make sure the models are fully converged, we also performed some longer training runs.
Figure 4 shows random fake samples after 1000 epochs.At this stage, the models are well converged and the visual difference between classical and physics-guided GAN samples has mainly disappeared.In general, the fake samples of both models show much better visual quality and less artefacts and inconsistencies compared to fake samples of generators trained for only 100 epochs 3 ○.However, few samples suffer from severe inconsistencies and are qualitatively inferior 2 ○.We refer to these failures as garbage samples.Garbage samples may arise from difficulties in the training process of GANs and problems of non-convergence like mode collapse, which is an open research problem [34].Although we do not observe mode collapse for the results shown here, the occurrence of garbage samples is a sign of local non-convergence, i.e. some noise inputs are mapped to garbage samples.Since the models are converged after 1000 epochs w.r.t. the   difference in output between two epochs, the garbage samples seem to originate from intrinsic difficulties in the training process and are not related to the number of training epochs.Apart from these outliers, the vast majority of samples (of both models) are visually indistinguishable from real samples.Nevertheless, domain experts may notice that the characteristic crack tip field still seems unphysical in the fake samples especially when compared to real samples with long cracks.

Sliced Wasserstein distances
For a thorough comparison of GANs, one needs to inspect a large number of fake samples.Doing this manually, would be very tedious and subjective.Instead, one should compare the results using meaningful, quantitative metrics.
For this, we follow [14] and calculate the sliced Wasserstein distances (SWD) introduced in Section 2.4 between fake data samples and real data samples on various scales.These scales are introduced by building a 5-level Laplacian pyramid [45]   GAN samples in Figure 3. Nevertheless, the results can be significantly different for each trained generator.This fact is reflected in the large error bars of the SWDs.
The results after 1000 training epochs are displayed in Figure 6.Here, we used 2 training runs for each GAN architecture.As expected, the distances are all smaller than after 100 epochs.In contrast to the results after 100 epochs (cf. Figure 5), both GAN architectures are closer together.However, the physics-guided GAN samples have significantly smaller SWDs for the fine resolution 256 × 256 and the low resolutions 16 × 16 and 32 × 32.This suggests that after 1000 epochs of training the physics-guided samples are still closer to the real samples in terms of quality and variation.Nevertheless, the few garbage samples seen in Figure 4 2 ○ could influence the SWDs especially at larger pyramid levels, e.g.256 × 256.

Geometry scores
To compare the geometry score (GS) introduced in Section 2.4 of different trained GANs, we generated fake datasets with the same number of samples N = 838 as the training dataset.To calculate the MRLTs of the real and fake datasets, we mainly follow the recommendations in [26].We set imax = 100 and use n = 1000 random landmarks.The number of samples in each landmark is 64.The maximal persistance time αmax is proportional to the maximal pairwise Euclidean distance between samples in each landmark, i.e. for j = 1, . . ., n: We used the implementation from [26] to calculate the MRLTs.
Figure 7 shows the distributions of MRLTs after 100 epochs of training.The error band originates from the uncertainty induced by the random landmarks and, even more so, from the 10 different models trained for each GAN architecture.This results in large variations of calculated MRLTs.Nevertheless, on average the phyiscs-guided GAN distribution is closer to the MRLTs of the real data distribution than the classical GAN distribution.This observation is quantitatively reflected in a smaller mean GS of the phyiscs-guided models (see Table 1).However, both fake data distributions are still far away from the real data distribution and the GSs are large.
In Figure 8, we see the MRLT distributions after 1000 epochs of training.Both GAN results are much closer to the real data than after 100 epochs and the phyiscs-guided GAN MRLTs almost coincide with the real data MRLTs.This accordance is shown in the calculated GSs in Table 1 as well.

Conclusion
We introduced a machine learning framework to generate artificial full-field displacements of cracked specimen by learning the underlying data distribution from a sufficiently large digital image correlation dataset.The training data was obtained during fatigue crack growth experiments of the aluminium alloy AA2024-T3.In contrast to finite element simulations, our method is able to produce large amounts of interpolated DIC-like displacement data in a fast and easy way but is limited in the sense that boundary conditions and crack configurations cannot be controlled.
Our approach is based on deep convolutional generative adversarial networks (DC-GANs).The main novelty compared to the classical DC-GAN framework is a physics-guided discriminator.This discriminator, in addition to the generated x-and y-displacement fields, gets also the derived von Mises equivalent strain as input.This enables the discriminator to detect physical inconsistencies in the generated fake samples more easily, thus enhancing the training process.
In order to evaluate trained generator models on an objective basis, we used two quantitative metrics.First, the sliced Wasserstein distance (SWD) between real and fake samples and, secondly, the geometry score (GS) approximating the topological distance between a generated data manifold and the training data manifold.
We observed superior performance of the physics-guided GAN compared to the classical GAN approach.This result was observed by visual evaluation of generated samples and confirmed by lower SWDs and GSs of the physics-guided models.Both, SWD and GS, proved themselves to be valuable evaluation metrics.They are useful to identify mode collapse and to select the best trained models.Nevertheless, it is important to note that there is no natural metric to evaluate the performance of GANs.In the absence of powerful pre-trained models like Inception for DIC-like data, we had to stick to GAN metrics that are independent of these benchmark models.Our findings support the claim that hybrid models, which combine data-driven methods with physical domain knowledge, can lead to more powerful models and faster training.
The visual inspection revealed a varying sample quality.Especially the converged models after 1000 epochs of training, apart from mostly good samples, produce few garbage samples.Although the number of these garbage samples is model-dependent, we were not able to avoid their occurrence completely.Moreover, we still face the issue of (local) non-convergence and mode collapse.To overcome these issues, one could try to stabilize training using suitable regularization techniques [42,46].
The main open problem concerns the control of boundary conditions like the crack path and external force.In contrast to FE-based data generation, with our approach it is not possible to control them.This challenge could be tackled by using a conditional GAN framework [47] and is part of current research.

Figure 1 :
Figure 1: Physics-guided GAN framework: A deep convolutional generator G creates fake interpolated DIC displacement data samples from noise z.These samples (u fx , u f y ) are used to calculate the corresponding von Mises equivalent strain ε f vm .All these three features are handed to the discriminator D, which has to decide whether samples are real or fake.

Figure 2 :
Figure 2: Random samples from the training dataset.Left: x-displacements.Middle: ydisplacements.Right: von Mises equivalent strains.The corresponding grid elements belong to the same data point.

Figure 3 :
Figure 3: Visual comparison of randomly generated classical and physics-guided GAN samples after 100 epochs of training.Classical GAN samples show a larger noise level in the von Mises strains compared to phyiscs-guided GAN samples 1 ○.

Figure 4 :
Figure 4: Visual comparison of randomly generated classical and physics-guided GAN samples after 1000 epochs of training.Both models seem to produce mostly good samples 3 ○ but also few garbage samples 2 ○.

Figure 5 :
Figure 5: Comparison of SWDs between classical GAN (left) and physics-guided GAN (right) trained for 100 epochs.The boxplot intervals range from the minimal to the maximal SWDs.The box includes ranges from the 25% to the 75% quantile and shows the median.

Figure 6 :
Figure 6: Comparison of SWDs between classical GAN (left) and physics-guided GAN (right) trained for 1000 epochs.The boxplot intervals range from the minimal to the maximal SWDs.The box includes ranges from the 25% to the 75% quantile and shows the median.

Figure 7 :
Figure 7: Comparison of MRLT distributions between the real dataset and fake datasets generated by classical and physics-guided GAN after 100 epochs of training.

Figure 8 :
Figure 8: Comparison of MRLT distributions between the real dataset and fake datasets generated by classical and physics-guided GAN after 1000 epochs of training.

Table 1 :
3SWD ×10 3 Subjective visual quality, calculated geometry score (GS), and sliced Wasserstein distance (SWD) (lower is better) for different GAN model architectures and training lengths.(*)apartfrom garbage samples (see Figure4) [44] resolutions 16 × 16, 32 × 32, 64 × 64, 128 × 128, 256 × 256.Each pyramid level corresponds to a specific spatial resolution.For each level, we compute the SWD between the training dataset and a generated fake dataset of the same size.More precisely, the SWDs are calculated between datasets of random 7 × 7 patches of the pyramid samples.The patches are pre-processed by normalizing each channel (i.e.x and y displacement) to mean 0 and standard deviation 1.To reduce uncertainty, we average the SWDs of ten runs with randomly sampled fake data.Since there are less unique patches for low resolutions, we adapt the number of random patches depending on the pyramid level.For the five resolutions, 16 × 16, 32 × 32, 64 × 64, 128 × 128, and 256 × 256, we use 128, 256, 512, 1024, 2048 patches, respectively.The integral in Equation (10) is approximated by choosing 512 random slices and averaging the results.We implemented a GPU-enabled version of the code from[14]using the PyTorch[44]framework.At least intuitively, a small SWD shows that the fake and real samples are similar.At low resolution (e.g.16 × 16) only large-scale features like the crack length are visible and a small SWD would indicate that the variation of crack lengths in the fake dataset is similar to the training dataset.At high resolution (e.g.256 × 256) very fine-grained structures like the inherent DIC noise is encoded in the patches.Figure5shows the calculated SWDs of classical and physics-guided GAN samples after 100 epochs of training.In order to estimate uncertainty, we trained 10 randomly initialized models each with the classical and the physics-guided GAN architecture.The main observation is that for all resolutions, physics-guided samples are closer to the training data than classical GAN samples.This indicates that physics-guided GAN samples are better in quality and variation.Especially for the high resolution 256 × 256, the SWDs show a large gap and confirm our visual observation of artefacts and unphysical noise as seen in the classical