Evaluation of surface nuclear magnetic resonance-estimated subsurface water content

The technique of nuclear magnetic resonance (NMR) has found widespread use in geophysical applications for determining rock properties (e.g. porosity and permeability) and state variables (e.g. water content) or to distinguish between oil and water. NMR measurements are most commonly made in the laboratory and in boreholes. The technique of surface NMR (or magnetic resonance sounding (MRS)) also takes advantage of the NMR phenomenon, but by measuring subsurface rock properties from the surface using large coils of some tens of meters and reaching depths as much as 150 m. We give here a brief review of the current state of the art of forward modeling and inversion techniques. In laboratory NMR a calibration is used to convert measured signal amplitudes into water content. Surface NMR-measured amplitudes cannot be converted by a simple calibration. The water content is derived by comparing a measured amplitude with an amplitude calculated for a given subsurface water content model as input for a forward modeling that must account for all relevant physics. A convenient option to check whether the measured signals are reliable or the forward modeling accounts for all effects is to make measurements in a well-defined environment. Therefore, measurements on top of a frozen lake were made with the latest-generation surface NMR instruments. We found the measured amplitudes to be in agreement with the calculated amplitudes for a model of 100 % water content. Assuming then both the forward modeling and the measurement to be correct, the uncertainty of the model is calculated with only a few per cent based on the measurement uncertainty.


Introduction
The technique of nuclear magnetic resonance (NMR) is best known in medical, chemical and physical applications where it allows a glimpse of the structure of matter at the scale of atoms. Additionally, NMR has found widespread use in geophysical applications for determining rock properties (e.g. porosity and permeability) and state variables (e.g. water content) or to distinguish between oil and water. NMR measurements are most commonly made in the laboratory and in boreholes.
The technique of surface NMR combines the information content accessible via NMR measurements with the nondestructive approach to derive subsurface information from surfacebased measurements. As such, surface NMR is the only geophysical exploration method providing direct and nondestructive information on important hydraulic subsurface rock properties.
Surface NMR is based on the free induction decay (FID) experiment, emitting an excitation pulse and recording the relaxation signal using large surface coils of tens of meters and detecting signals from depths as much as 150 m. First field measurements date back to the 1980s (Semenov et al 1988). In contrast to laboratory NMR, where a sample is placed inside a strong artificial primary magnetic field, the object of investigation is outside the coil and the earth's field acts as the primary field. We give a brief review of the basic principles and state-of-the-art in the first section.
In laboratory NMR a calibration is used to convert measured signal amplitudes into water content. Surface NMR-measured amplitudes cannot be converted by a simple calibration. The water content is derived by comparing a measured amplitude with an amplitude calculated for a given subsurface water content model, which acts as input for a forward modeling that must account for all relevant physics.
A usual option to check whether the measured signals are reliable or the forward modeling accounts for all effects is to conduct measurements in a well-defined and known environment. If the derived results match the known environment, one can expect to have a 3 working method. Various groups of the scientific community have made such measurements and have demonstrated the reliability and potential of the method under different environmental conditions and geology (Gev et al 1996, Lange et al 2000, Vouillamoz et al 2005, well-known test sites (Yaramanci et al 1999) and in comparison with borehole measurements (Müller-Petke et al 2011) or electrical methods (Goldman et al 1994). However, the most convenient option to verify the surface NMR measuring and modeling technique is to make measurements on a frozen lake. Such conditions provide high signals, but more importantly precise knowledge of the subsurface water content and a homogeneity that is unique compared to other true geological settings. Even though such measurements have been made before (Schirov et al 1991), their documentation is rather spare and detailed analysis is missing. Therefore, we made such measurements with the latest-generation surface NMR instruments (GMR; Vista Clara) on a frozen lake located in the Harz mountains (Germany, Lower Saxony). The lake not only provides a known water content of 100%, but also since it is an artificial barrier lake built for energy supply the geometry is known very exactly from the construction plans.
These measurements are used to • check that the surface NMR-measured signal amplitudes are in agreement with the calculated amplitudes based on the known lake geometry and water content. If measurement and modeling are in agreement, it is unlikely that both contain errors that compensate for each other. Thus, it is more likely that both are correct.
• evaluate the estimated subsurface depth distribution of the water content, i.e. estimating the thickness of the ice layer and the depth of the hard rock basement. Both boundaries are sharp interfaces.
• discuss the plausibility of the estimated error measures for water content (variances) and layer thickness (resolution).

Review of principles of surface nuclear magnetic resonance
To introduce the basics of surface NMR, we first briefly explain the idea before giving a detailed mathematical description of the method later.

The idea of surface nuclear magnetic resonance
A surface NMR measurement is a set of NMR FID experiments. The principal scheme of an FID experiment is depicted in figure 1. In the presence of a static magnetic field B 0 , a net magnetization M 0 is built up after a certain time. This state is called the equilibrium state. The amplitude M 0 of this net magnetization is defined as with γ the gyromagnetic ratio, µ 0 the magnetic permeability in vacuum,h Planck's constant, k b the Boltzmann constant and c the number of protons per unit volume (Levitt 2002). In contrast with laboratory measurements using a permanent magnet of high field strength, the earth's magnetic field is used as the primary field B 0 . The net magnetization is built by a number of microscopic magnetic moments that exist due to an intrinsic property of a hydrogen proton, the spin. The orientation of a magnetic moment in a magnetic field is forced to move on a rotary motion with the Larmor frequency Due to the low magnetic field strength of the earth's field, roughly ranging from 25 to 55 µT (1000-2300 Hz), both the net magnetization and the Larmor frequency are low. Large surface coils of some tens of meters are used to emit the excitation (transmitter) field B T (r) at the Larmor frequency. This excitation field causes the net magnetization to re-orientate from its equilibrium state. In laboratory NMR, a sample is placed inside a coil that emits an (inside the coil homogeneous) excitation field for a certain time τ p . The time τ p is varied until 90 • flip angles are applied. The homogeneity of the 90 • flip angles across the sample depends on the homogeneity of B 0 . In surface NMR, the sample is outside the coil and the excitation field is therefore inhomogeneous. Roughly speaking, the field strength decreases with distance to the coil. The flip angles realized by the coil excitation field are not 90 • in the complete subsurface, i.e. the distribution is not homogeneous. The distribution of flip angles depends on the spatial properties of B T (r) (figure 2), the amplitude of the current I causing the field and the time τ p the field is emitted. These spatial properties depend on the coil geometry, i.e. the size and shape and the subsurface conductivity. In contrast to the laboratory, τ p is not short but up to 40 ms long. Since both I and τ p are of importance, a pulse moment q = I × τ p is given in surface NMR. This pulse moment summarizes both parameters that can be varied in order to change the flip angle distribution in the subsurface.
After the excitation field is turned off, the rotary motion of the net magnetization emits an electromagnetic field while relaxing back to the equilibrium state. This receiver field B R (r) can be detected either by the same coil or, as recently presented by Hertrich et al (2005), by a spatially separated coil. The relaxation signal recorded is the cumulative response of all hydrogen protons excited by the excitation field with respect to the distribution of flip angles. Consequently, changing the distribution of flip angles that can be realized by changing the pulse moment q gives a different relaxation signal since the cumulative response has changed. This is the essential key in surface NMR that opens the door for imaging the subsurface. There is a significant dead time between the shutdown of the excitation field and signal recording; it ranges in length from some milliseconds to 40 ms. The length of the dead time depends on the instrument used and the processing applied to the time series. For instance, a broad band detected signal may have a dead time of 5 ms. In many cases, it is necessary to improve the signal quality by a band-pass filter centered around the Larmor frequency, e.g. 100 Hz. The dead time is then increased by 1/filter-bandwidth, i.e. 10 ms (Walsh et al 2011), because of the filter transitional function and results in 15 ms total dead time. In general, this rather long dead time restricts surface NMR to detect only those relaxation signals of sufficiently long relaxation times (Dlugosch et al 2011) and does not currently allow us to detect clay-bound water or ice.
The initial amplitude of the signal, which is the amplitude of the flipped net-magnetization, is proportional to the number of protons excited by the excitation field and is therefore proportional to the amount of detected water. In the laboratory, the initial amplitude is the amplitude after pulse shutdown. A calibration measurement using a water sample is made to derive the calibration factor. The calibration factor is applied afterwards to calculate the water content of an unknown sample from the measured initial amplitude.
In surface NMR, long pulses of minimum 10 ms up to 40 ms are needed. During this pulse time τ p significant relaxation takes place. To account for this, it is suggested to obtain the initial value by extrapolation to the time −τ p /2 (Walbrecker et al 2009), i.e. an extrapolation to the time in the middle of the pulse. For the conversion from amplitudes into water content a simple factor does not exist. This is not because known water samples do not exist (e.g. lake measurements). But parameters that do not change in the laboratory such as the coil diameter or parameters that do not influence the signal for small coils such as sample conductivity do significantly influence the measured amplitude for surface NMR. Each variation of these parameters will change the calibration factor. In addition, an amplitude measured for a certain pulse moment reflects not only the cumulative amount of water but also the location of the sub-amounts behind this cumulative amount. For instance, the same amount of water at different depths results in different signal amplitudes, all of which prohibits a simple calibration.

6
The idea of surface NMR can therefore be summarized to apply several FID experiments with different pulse moments q. Due to the dependence of the flip angle distribution on the pulse moment, each recorded FID illuminates a certain range of depth in the subsurface. A set of FIDs for different q's is then inverted from measured amplitudes in this q space to true depth or (more general) three-dimensional (3D) spatial location of subsurface water content. This is done by a detailed forward modeling of the surface NMR problem using a subsurface model as input for calculating theoretical signals that must account for all relevant physics. These theoretical signals are then compared with the measured signals. The inversion progress varies the subsurface model until both match within the measurement uncertainty.

Mathematical description of the forward and inverse problem
The main task of the forward problem for surface NMR is to correctly calculate the magnetic fields of a surface coil in 3D for a conductive subsurface and non-flat surface. The state-of-theart mathematical formulation of this problem (Mueller-Petke and Yaramanci 2010) calculates the induced voltage for a certain pulse moment q and recording time t: while G(r, q) is the kernel function containing the spatial sensitivity and m(r, T * 2 ) the distribution of water content as a function of the spatial position r and decay time T * 2 . The kernel function G(r, q) is written as (Weichman et al 2000, Hertrich 2008) with B + T (r) being the transmitter field; B − R (r) the receiver field, b R , b T and b 0 unit vectors pointing in the direction of the receiver, the transmitter and the earth's magnetic field and ζ T (r) + ζ R (r) are phases of the transmitter and receiver fields according to a point r in the subsurface.
The kernel function is calculated in 3D and includes the configuration and site dependent parameters such as loop size, subsurface resistivity, the earth's field conditions and temperature. The kernel function is of highly oscillating character due to the sine term in equation (4). To sample this adequately an equidistant sampling in the log-space is applied. Close to the coil wire, the cell size is only a few cubic centimeters and increases with distance from the coil.
The state-of-the-art approach estimating the subsurface water content m(r, T * 2 ) from the measured data set V (q, t) is the QT inversion (Mueller-Petke and Yaramanci 2010). The QT inversion (QTI) takes the complete data set D = V (q, t) into account and provides the full solution m(r, T * 2 ). However, if only the distribution of water content m(r) is the focus of interest, the initial value inversion (IVI) is often sufficient. As schematically shown in figure 3 the measured FIDs for each pulse moment are fitted by a mono-exponential function to calculate the initial values E 0 . These initial values as a function of the pulse moment are then inverted for subsurface water content distribution using the inverse operator of the kernel. Obviously, this approach is valid only if the FIDs are mono-exponential. This is true for (i) single aquifer structures showing mono-exponential behavior (which is mostly observed) or (ii) several aquifers that have the same decay time (Mohnke and Yaramanci 2005). 3. Measurements on a frozen lake: assessment of surface nuclear magnetic resonance-measured voltages

Field setting
At the upper Einersberger Lake (Harz Mountains, Germany) measurements were made in winter 2010. The Einersberger Lakes are artificial barrier lakes built for energy supply. According to the task of checking surface NMR-estimated water content, the lake was chosen due to • low artificial electromagnetic noise, i.e. good signal quality • well-known geometry/bathymetry for precise forward modeling • sharp interface of the water layer and the hard rock, used to check the resolution capabilities • thick winter ice shield (30 cm) for safety. The campaign was supported by ground penetrating radar (GPR) measurements to provide the bathymetry independent of the lake construction plan. Additionally, a plumb sink was used to measure the depth of the hard rock bottom below the loop. Both measurements confirm the information on the construction plan.
To keep the task of checking the bulk water content with surface NMR simple, 3D conditions should be avoided. Even though the forward modeling is able to handle 3D water content distributions, a 3D inversion needs more than one measurement. Since an aim of this study was also to check the resolution capabilities of the method, such conditions need to be avoided. Therefore, 2D kernel functions of different loop sizes were calculated and placed at several different positions on the lake to visualize the sensitive volumes and to find a  . Absolute values of the 2D kernel function for 2 As pulse moment, lake bathymetry measured by GPR (red) and depth below the loop center measured by the plumb sink (green).
configuration that shows flat bathymetry in the sensitive area. Figures 4 and 5 show the final position and the kernel function of the final configuration using a 30 m diameter circular loop with two turns. Similar to 1D kernel functions, the penetration depth increases with increasing 9 pulse moment, i.e. the largest pulse moment (2 As) has the largest penetration depth and is therefore most appropriate for visualization. Clearly, the sensitivity is restricted to the area of relatively flat bathymetry, detected by the GPR measurements.
The sensitivity of the 30 m loop and its position on the lake allow us to reduce the threedimensionality to a 1D layered subsurface with • layer 1 (ice): 0.3 m thickness, 0% water content due to instrument's dead time • layer 2 (bulk water): 7.5 m thickness, 100% water content • layer 3 (hard rock-gray wacke): 0% water content while the given water content refers to a surface NMR-detectable water content that is dependent on the dead time. According to the dead times in surface NMR that are longer than 10 ms, the ice layer contains no detectable water. The gray wacke-hard rock bottom has very small porosity and therefore no detectable water content.
The measurement finally consist of 32 pulse moments q each with 10 stacks. The pulse length was 10 ms and the instrument's dead time is 10 ms before signal processing.

Signal processing
Latest surface NMR devices provide the recorded time series as broad band data records at 50 kHz sampling rate instead of providing only the envelopes of the records. Such broad band records enable advanced techniques for noise cancellation (Walsh 2008). With the availability of such records, signal processing becomes an important issue. The first step in processing is a digital synchronous detection or demodulation of the measured signal with the frequency of the excitation field (Levitt 1997). A low-pass filter is then applied to improve the signalto-noise ratio. The low-pass filter is a Butterworth filter of fifth order and is applied forward and reverse to ensure zero phase behavior. The next step is to resample the signal according to the low-pass filter. Due to the well-known Nyquist criterion, there is no need to keep a signal that is sampled with 50 kHz after a low-pass of e.g. 1000 Hz. The resampling rate should be selected according to low-pass frequency or more accurately calculated according to the filter transitional function. The resampling ensures that neighboring data are still independent after processing. This is a necessary step to calculate a reliable error, as shown later. At least 10 measurements for each pulse moment are stacked together. The most significant difference to the original processing scheme of the surface NMR device is the synchronous detection, i.e. the demodulation of the NMR signal that oscillates with the Larmor frequency down to the envelope signal. This allows for low-pass filtering and stacking of the envelopes instead of bandpass filtering and stacking the original times series. Without synchronous detection resampling according to the filter would be impossible since the oscillating NMR signal had to be sampled according to the oscillation.
Next, a complex fitting is realized by a nonlinear fitting algorithm providing complex amplitude, decay time and frequency offset (Legchenko and Valla 1998). A frequency offset usually occurs since the synchronous detection is done with the frequency of the excitation field. But the earth's magnetic field shows slight changes with time and space. This may result in a frequency offset of the measured FID with the synchronous detection frequency of some hertz. The initial amplitudes needed for the initial value inversion are derived from this complex fit by extrapolation. Since for the lake measurements 32 pulse moments were used, 32 initial values are obtained. These 32 initial values are later used for the IVI for both layered and smooth inversion. The set of these initial values is often referred to as the sounding curve.
The results of the complex fit are then used to rotate the signal. In this case, rotation means to correct for the phase and frequency offset in order to get a signal that has zero phase and zero frequency offset. If the data are perfectly rotated, only the real part contains NMR signals. For noise-free records the imaginary part would then be zero. But measured signals always contain noise. By the synchronous detection the noise is separated into real and imaginary parts. Since the noise is not a single frequency but mostly shows a white spectrum, both real and imaginary parts show the same noise level and rotating has no effect on these noise levels. Thus, after rotation the real part of the signal contains noise and the FID, while the imaginary part contains only the noise. An example FID is presented in figure 6. The FID data are rotated into the real part, while the noise is equal in both real and imaginary parts. Even though the rotation of the data has no impact on the calculation of the initial values that are obtained directly from the complex fitting, the rotation is needed to estimate the noise level of the measurements.

Data error estimation
The complete surface NMR data set V (q, t) of the lake measurement consists of 32 pulse moments q and therefore 32 FID records each with 10 stacks. Each FID record is 500 ms long and according to the 50 kHz sampling consists of 25k samples. We refer to such a single sample as a single datum V (q = const, t = const). Each single datum is the mean of 10 stacks. Note 11 that, after the synchronous detection, the record is complex and therefore the single datum is complex.
We distinguish two data errors. The first error D is the standard deviation of a single datum of the stacked FID record, i.e. the standard deviation of the measured voltages after stacking and synchronous detection. The calculation of the standard deviation can be performed individually for real and imaginary parts. As mentioned above, each FID consists of 10 stacks. Thus, a single datum is the mean of 10 stacks and D is the standard deviation of this mean. For the complete data set, 32 × 25k × 2 (32 pulse moment × 25k samples × real + imaginary parts) D exist. In a first-order approximation only one value for the complete data set is needed. Thus, D can be calculated as • Standard deviation of the 10 stacks for a single datum divided by the square root of 10, i.e. the standard deviation of the mean. This results in 32 × 25k × 2 values that are then averaged to one.
• Standard deviation of the 25k samples of a record V (q = const, t) that is 10 times stacked and contains no NMR signal, i.e. a noise record before the measurement. This results in 32 × 2 values that are averaged to one. This needs a noise record before the NMR measurement that is not available for the GMR device.
• Standard deviation of the imaginary part of a 10 times stacked record V (q = const, t) that contains the NMR signal in the real part after rotation.
We used the latter. An example is shown in figure 6. This error contains the artificial noise and the instrumental noise. According to the manufacturer's manual (Vista Clara 2010), instrument noise is given as 500 pV (Hz) −1/2 . The second error E 0 is the standard deviation of the initial amplitude E 0 . This error is below D. This can be understood from a simple weighting example. We weight a sample 25k times and obtain a data set of 25k data D. The error D is the standard deviation of each weighting. This might be the accuracy of the instrument. For the surface NMR example above, this is already derived from 10 stacks as a deviation of the mean. Nevertheless, we want to know the weight of the sample and thus the mean of all weighting is calculated. For the surface NMR example, this is E 0 . The error E 0 is the standard deviation of the mean that is given by the square root of the number of independent weights ((25k) −1/2 ) below D. It is important that these measurements are independent. However, the calculation is more difficult.
The second error E 0 is calculated from the mono-exponential fit and taking the extrapolation into account. The variances σ E and σ T 2 * of the parameters of the mono-exponential fit (amplitude E and decay time T * 2 ) are calculated using the diagonal elements of the covariance matrix. The covariance matrix for a set of model parameters m is given as (Aster et al 2005) with G being the forward operator for the mono-exponential fitting. Finally, the standard deviation of E 0 has to include the extrapolation error from E (the amplitude at t d = t deadtime ) to E 0 that is the amplitude at t = −τ p /2 (see figure 6). Extrapolation back to τ p /2 is necessary since the pulse length cannot be neglected (Walbrecker et al 2009). The extrapolation error is calculated using Gaussian error propagation and leads to  Table 1 shows both errors D and E 0 as a function of the signal processing applied to the recorded signals. Two effects are of interest. The effect of decreasing D with decreasing low-pass is exactly what low-pass filtering should do. So this effect is intended. The signal quality is improved by narrow filtering. If we assume a white noise signal and filter this signal with a low-pass filter of different low-pass frequencies, the amplitude after filtering decreases with decreasing low-pass frequency. Of course one should keep in mind the band width of the NMR signal. In the case of very long relaxation signal the band width of the FID signal is very small but for fast decaying signals this becomes important. Secondly, for a constant low-pass filter (e.g. 100 Hz) E 0 increases with decreasing sampling rate. Thus, one may ask which resampling is correct, i.e. which error estimation is correct since the error of the initial value derived from the complex fitting should depend only on the filter applied. This effect belongs to the calculation of E 0 . The variances σ E and σ T 2 * are correct only if all data are independent measurements. If a filter is applied to the FID record but no resampling is applied, then dependent measurements exist and the variances are underestimated by the amount of oversampling. This again can be understood from the simple weighting example. We weight a sample 100 times and calculate the mean and standard deviation. Then we average 10 single (independent) measurements to one measurement and get 10 resulting measurements. The mean and standard deviation of these 10 would be the same as before. This would be the same as filtering with resampling. Filtering without resampling would mean to make a running average, i.e. the first measurement contains the average of the measurements 1-10, the second is 2-11 and so on. One would still get 100 (dependent) measurements. The mean would still be correct but the standard deviation would be much smaller. Thus, to obtain correct error estimates for E 0 , the record has to be resampled according to the low-pass filter frequency. As a rough but working rule the Nyquist criterion suggests twice the filter frequency. In detail the filter transitional function, i.e. the weighted average in time domain, is to be analyzed in order to calculate a correct resampling rate. In table 1 the correctly estimated error is marked in red. It shows that E 0 increases only slightly with increasing low-pass filter frequency. This is reliable since with decreasing low-pass frequency, E 0 decreases but the number of independent measurements also decreases, which is a contrary effect to the standard deviation of the mean, i.e. E 0 . If the noise were purely white, E 0 would be constant. In the following, we use the 1000 Hz low-pass filtered data. In any case, it should be avoided using a low-pass filter frequency that is below the NMR signal band width. In this case, the band width of the signal is very small due to the long relaxation time, and a low-pass frequency of 100 Hz would be appropriate. Nevertheless, usually surface NMR data contain faster decaying signals and a low-pass filter of 1000 Hz is more appropriate. The error E 0 of the measured data is below the size of the markers. The deviation between observed d obs and estimated d est data, i.e. the fitting quality of the inversion, is given as rms error. Middle: estimated subsurface water content. Right: subsurface according to plumb sink measurement.

Water content estimation by layered inversion
The measured signals are mono-exponential with a T * 2 decay time of about 1 s. Fitting the data with a decay time distribution did not improve the data fit significantly. This mono-exponential fitting and extrapolation of the initial values allow us to use the simple IVI scheme. Since surface NMR measures T * 2 , i.e. an enhanced relaxation rate compared to T 2 due to diffusion and de-phasing (Dunn et al 2002), the observed T * 2 of 1 s indicates a very homogeneous earth's field.
The resulting curve of initial values for all pulse moments (also called the sounding curve) is then inverted using an IVI scheme that allows for sharp layer boundaries. For checking the measurements and forward modeling it would be sufficient to calculate a synthetic sounding curve using the known geometry and water content and compare it to the measured. If both match within the measurement uncertainty, a proof of concept would be presented. But not only a proof of concept is necessary to apply surface NMR successfully. The geophysical results are the results after inversion. Thus the inversion results should be verified. The inversion routine changes the layer thicknesses and its water content as long as the forward calculated sounding curve matches the measured. The results shown in figure 7 are in very good agreement with both the expected geometry and water content. In particular, the bulk water content of the lake is perfectly estimated. Obviously, if the geometry assumptions for the inversion, i.e. sharp boundaries and 1D conditions, are valid, the layered inversion can estimated the subsurface water content very accurately. Based on this result, we can recheck the error estimations of the initial values E 0 (see table 1). For the layered inversion and this well-known case of two sharp boundaries, over-fitting the data is rarely possible since the inverse problem is over-determined (32 data points and 5 free parameters) and the solution is a least-squares fit to the data. The data fit, i.e. the deviation in terms of root mean square (rms) between measured and estimated data, of the layered inversion of 6 nV rms error (figure 7) should equal the average error of the initial values E 0 . Note that this is true only for this 'controlled' case where we know that the subsurface is built of two sharp boundaries. This error nicely fits the error calculated during the signal processing and confirms the results derived in this section. Nevertheless, this is one solution of the inverse problem that explains the data. Are there other models that explain the data sufficiently, i.e. within their error bounds? All other combinations of layer thickness and water content that result in the same data fit are true as well. By the variation of the parameter in a range to fit the measured data within 6 nV, the model accuracy of the water layer is found and summarized to • layer 1: 0.35 ± 0.1 m thickness, 0.01 ± 0.01 m 3 m −3 water content.
• layer 3: 0.005 ± 0.01 m 3 m −3 water content. These uncertainty results are derived assuming an error-free forward modeling, i.e. they include only the measurement error. We discuss additional errors that worsen these results in section 4. In anticipation of the discussion in this section, an unknown temperature profile in the water layer and the uncertainty of the flatness of the bathymetry below the loop leads to • layer 1: 0.35 ± 0.1 m thickness, 0.01 ± 0.01 m 3 m −3 water content.

Water content estimation by smooth inversion
In addition to layered inversion, smooth inversion estimates the water content for a fixed model discretization, i.e. the layer thickness is not a parameter of optimization but set, for instance, to 0.2 m. Thus, the subsurface is discretized by layers of 0.2 m and the water content of each layer is to be estimated. Since the problem is under-determined (32 data points and 150 free parameters) it has to be stabilized by introducing smoothness constraints that are controlled by a regularization parameter. The regularization parameter is chosen to satisfy both data fit and model smoothness. We use a solver based on the generalized singular value decomposition as described for this problem (Müller-Petke and Yaramanci 2008). This concept of inversion leads to a smooth distribution of the water content and not to sharp boundaries. However, it is often used either if no a priori information is available or if sharp boundaries cannot be assumed. Concerning the case presented here, the smooth inversion is not very appropriate but is presented to show the differences from the layered inversion if no a priori information is available. To additionally constrain the inversion, an upper and a lower boundary for the water content are realized using a scaled and shifted tangent transformation (Mueller-Petke and Yaramanci 2010). The lower boundary is set to zero, while the upper boundary is set to 1.1. The error E 0 of the measured data is below the size of the markers. The deviation between observed d obs and estimated d est data, i.e. the fitting quality of the inversion, is given as rms error. Middle: estimated subsurface water content (blue) and its uncertainty. Right: subsurface according to plumb sink measurement.
The variances of the model parameter are derived from the diagonal elements of the covariance matrix (i.e. covariances are neglected) using the data error of the initial values E 0 as calculated from the mono-exponential fitting. Note that the model variances (see equation (5)) have to be calculated using the same inverse operator as that for estimating the water content model. Thus, they are calculated in the tangent transformed model space.
The results are shown in figure 8. It is remarkable that the data fit is worse than for the layered inversion. The simple smoothness constraints (constant for the complete depth range) avoid estimating a model with sharp boundaries and keep the water layer smooth. A data fit with higher accuracy can be obtained with a very small regularization parameter, however, this would result in highly oscillating water content distribution. Since the problem is stabilized by additional constraints the model variance is larger than for the layered inversion. As expected from a smooth inversion the layer boundaries are not sharp. Both upper and lower boundaries show a smooth transition of the water content over a depth of approximately 0.75 m. This transition zone symbolizes the resolution of the method using smooth inversion. Due to this smoothing the water content of the ice layer is too high. The water content of the lake is estimated as 1 ± 0.05 m 3 m −3 , which confirms the water content estimated from the layered inversion. What is noteworthy is the decrease in water content below 5 m depth, which cannot be explained within water content uncertainty. It might be an effect of the resolution alone, but it is more likely to be a cumulative effect of the resolution of the smooth inversion, the temperature increase at the bottom of the lake and the not perfectly flat bathymetry.

Discussion
The measurement made on top of a frozen lake yields data that can be explained by a stateof-the-art forward modeling using the known geometry and water content of the lake. Does this prove both the modeling and measurement techniques? It is indeed not 100% proof since both could be wrong and errors could compensate for each other. However, it is rather unlikely that measurement error of the instrument compensates for errors of the modeling.
How reliable are the uncertainty estimates? If the modeling and measurement technique is, in principle, correct, the uncertainty of the obtained subsurface model depends on the accuracy of the measurement and the accuracy of the modeling. But how to check these accuracies? For instance, to check the accuracy of an ohmmeter, a precisely known resistor is needed. Transferred to a geophysical field technique, if modeling accuracy is significantly better compared to the measurement accuracy, the determination of the measurement accuracy is possible via measuring and modeling a known object, just like the lake measurements presented here. Usually the parameters used for modeling are not known with arbitrary accuracy for field measurement. The main parameters, their influence and accuracy are • The depth of the hard rock basement is known with an accuracy of some centimeters by the plumb sink. But the flatness of the basement is only known from the GPR measurements with an accuracy of some decimeters.
• The temperature of the lake is set at 275 kelvins during kernel calculation. We can expect a variation of 2 kelvin since the top of the lake is frozen and at the bottom it is usually 277 kelvin. For the net magnetization this temperature variation yields ≈1.5% error.
• The water conductivity was measured as 34 µS cm −1 or ≈300 m. Even large variations in the conductivity, which are unlikely, will have no influence on the kernel function for conductivities below 100 µS cm −1 (Braun and Yaramanci 2008).
• The kernel is of highly oscillating character. The sampling close to the coil wire must be very dense. We use a logarithmic equidistant spacing with a few cubic-centimeters cell size close to the wire. The error introduced by this spacing is below the nV range.
• The earth's magnetic field changes with time and depth. The analysis of the data showed that the Larmor frequency variation is below 0.5 Hz. This has a relevant effect neither for the kernel calculation nor for the NMR physics (Walbrecker et al 2011).
This leads to an overall uncertainty in the forward modeling of some nanovolts. Determination of the modeling uncertainty by very precise measurements is the other way. The instrument accuracy of the GMR is given as 500 pV (Hz) −1/2 . But this is only the instrumental noise. Surface NMR measurement usually contains high artificial electromagnetic noise. We presented a scheme to determine this error and presented an error of about 6 nV for this measurement. Consequently, neither modeling accuracy nor measuring accuracy can be determined using such field measurements. Nevertheless, if either the measurement or the modeling contained an error larger than the ones taken into account, the inversion would not lead to the expected subsurface. Therefore, one can rely on the values obtained here if both modeling and measurement errors are taken into account.
How useful are the uncertainty estimates? The uncertainty of the model presented here is not valid for any other case since the uncertainty depends on the measurement quality, i.e. the signal-to-noise ratio and the 'true' subsurface. For instance, the resolution of the method decreases with increasing depth, i.e. for a lake bottom at a depth of 50 m the uncertainty would not be much larger. But we have presented a scheme to obtain the uncertainty of the derived model that can be used to determine the model quality and this scheme is valid for all other cases.

Conclusion
Surface NMR measurements were made on a lake to check the reliability of the estimated subsurface water content distribution. We found that the method is able to yield the expected water content of 100% as well as the upper (ice layer) and lower (hard rock) boundaries of the lakes. Therefore, both the measuring and the forward modeling technique are assumed to be sufficiently correct. We showed that a reliable data error, necessary to calculate reliable model variances, can be derived from the data only if a resampling according to the low-pass filter frequency is applied. As a rough but working rule, twice the filter frequency is suggested. We obtained an uncertainty of the estimated water content that is only a few per cent for these measurements, i.e. a high S/N ratio and a shallow target of investigation. Finally, we observed long T * 2 times of 1 s for bulk water in the earth's field conditions.