Failures of Minimum Variance Analysis in Diagnosing Planar Structures in Space

Minimum variance analysis of the magnetic field (MVAB), among various techniques of planar structure analysis, is most widely used for its numerical simplicity and loose requirements for data. Through a large number of studies based on MVAB, a global picture of the solar wind intermittency has been established. However, the huge discrepancy between the results from MVAB and other techniques like timing/triangulation implies that the uncertainty of MVAB is a crucial issue that is not fully understood. Utilizing Cluster data, we establish a data set comprised of 6752 discontinuities, whose orientations are precisely determined by timing, as a benchmark for testing MVAB. We find that the scatter of the MVAB normals around the timing normal can be reduced by elevating the threshold for the eigenvalue ratio λ 2/λ 3 and narrowing the data window to which MVAB is applied. The misidentification of discontinuities with BN/∣B∣ < 0.4, Δ∣ B ∣/∣ B ∣ < 0.2 as rotational discontinuities (RDs, identified by BN/∣B∣ > 0.4, Δ∣ B ∣/∣ B ∣ < 0.2) is proved to be a major and inherent defect of MVAB, which can occur even when λ 2/λ 3 is large. Such a misidentification process is revealed to be related to a special discontinuity geometry. It also explains the false RD predominance reported by previous studies based on MVAB. Finally, we provide advice for the application of MVAB and discuss the possibility of obtaining the real statistical properties of interplanetary discontinuities by using MVAB.


Introduction
A number of techniques have been developed for the analysis of planar structures in space, such as interplanetary discontinuities (IDs; Lepping & Behannon 1986;Horbury et al. 2001;Söding et al. 2001;Tsurutani et al. 2005;Neugebauer 2006;Zhang et al. 2008;Neugebauer & Giacalone 2010;Wang et al. 2013;Yang et al. 2015;Liu et al. 2019;, 2022a, 2022bKrasnoselskikh et al. 2020;Akhavan-Tafti et al. 2021), the terrestrial magnetopause (Khrabrov & Sonnerup 1998a;Haaland et al. 2004;Sonnerup et al. 2004;Shi et al. 2006;Fu et al. 2019a;Wang et al. 2020), the magnetotail plasma sheet and dipolarization fronts (Nakamura et al. 2002;Runov et al. 2009;Cao et al. 2010;Fu et al. 2012aFu et al. , 2012bFu et al. , 2013Fu et al. , 2019bFu et al. , 2020Huang et al. 2015;Xu et al. 2018), etc. Some techniques like triangulation/timing demand multispacecraft measurement data (Burlaga & Ness 1969;Horbury et al. 2001;Dunlop et al. 2002;Knetter et al. 2003Knetter et al. , 2004Haaland et al. 2004;Zhang et al. 2022). For a structure moving with a constant velocity, an array laid out well of at least four spacecraft is capable of measuring its orientation and speed simultaneously through the time delays between spacecraft. Some techniques are also designed for single-spacecraft missions. These techniques mainly include minimum variance analysis of the magnetic field (MVAB), which determines the normal of a planar structure so as to make the normal magnetic field component have the smallest variance (Mazelle et al. 1997;Cao et al. 2009Cao et al. , 2013, minimum mass flux residue, which requires the mass flux along the normal to conserve across a boundary , minimum Faraday residue, which relies on a constant tangential electric field in a comoving frame with target structures (Terasawa et al. 1996;Khrabrov & Sonnerup 1998a), and the cross-product method, which determines the normal as the cross product of magnetic field vectors on two sides under the assumption of a vanishing normal magnetic field (Knetter et al. 2004;Liu et al. 2015). The MVAB method, for its numerical simplicity and loose requirements for data, has become the most widely used technique, especially in the field of IDs (Smith 1973;Neugebauer 2006;Neugebauer & Giacalone 2010).
In the practical application of MVAB, its error is a key issue to be addressed. Due to the complicated form of the eigenvalue decomposition, that is a difficult task. However, much effort has been devoted at theoretical and numerical levels (Sonnerup 1971;Lepping & Behannon 1980;Kawano & Higuchi 1995;Khrabrov & Sonnerup 1998b). For example, Kawano & Higuchi (1995) employed the bootstrap method-a general nonparametrical methodology used in many areas-to estimate the errors of the MVAB results. The bootstrap method assumes that the observed data series is a superposition of the real magnetic field pattern and Gaussian white noise. Thus, it is possible to simulate the white-noise influence on MVAB results by randomly regrouping the observed data series many times, although high computing power is demanded. Sonnerup (1971)  which n and λ represent the eigenvectors and eigenvalues, Δ and σ are the random error and standard deviation, and the subscripts 1, 2, and 3 correspond to the maximum, intermediate, and minimum variance axes, respectively. Later, Khrabrov & Sonnerup (1998b) provided a more complete analytical treatment and derived the error bound of n 3 in different conditions as a function of three eigenvalues and the number of data points. It worth noting here that the approaches of Sonnerup (1971) and Khrabrov & Sonnerup (1998b) also rely on the assumption of independent and identically distributed errors in data series, and only focus on the error generation and transmission in numerical processes. They both demonstrated that a large eigenvalue ratio (i.e., λ 2 /λ 3 ) is the sign of reliable MVAB results. The threshold for λ 2 /λ 3 is artificially specified in practice, mostly between 1.5 and 3 (Smith 1973; Lepping & Behannon 1980;Söding et al. 2001;Wang et al. 2013;Liu et al. 2021Liu et al. , 2022a. Lepping & Behannon (1980) tested MVAB with simulation data and claimed that a threshold of 1.8 is acceptable to obtain a meaningful result. However, Knetter et al. (2004) compared the analysis results from MVAB and timing for 129 IDs and suggested applying MVAB only if λ 2 /λ 3 > 10. In our previous work, we also noticed occasional failures of MVAB even with a large λ 2 /λ 3 (Liu et al. 2021(Liu et al. , 2022a. Such results indicate that the above error estimation is too simplified, and some technical and physical factors relevant to the performance of MVAB have been missed. The global picture of the solar wind intermittency, previously established through a number of studies based on MVAB, now faces serious challenges. In this study, we test the MVAB method at the application level by utilizing data from the Cluster spacecraft (Escoubet et al. 2001). Potential factors affecting the MVAB performance are investigated. Based on the test results, we provide advice for the application of MVAB and reassess the previously established picture of the solar wind intermittency. This paper is organized as follows. In Section 2, we introduce the spacecraft data and the data set we established for testing MVAB. In Section 3, we investigate the dependence of the MVAB accuracy on relevant factors and discuss its efficacy. Section 4 is a summary and discussion.

Data Set and Data Analysis
The magnetic field data used in this study are provided by the fluxgate magnetometer (FGM) instruments (Balogh et al. 2001) on board four Cluster spacecraft during the period from 2002 July 1 to 2008 December 31. We then focus only on the data in the solar wind where abundant IDs exist and the FGM data exhibit less irrelevant perturbations. Data in some intervals are also abandoned because of too many data gaps. Eventually, the cumulative time of the data reserved is ∼312 days for each satellite. The ESAʼs Cluster Science Archive provides the FGM data in various time resolutions, and the 5 Hz data are used.
The discontinuity identification criterion employed in this study is similar to that in Liu et al. (2022a). At each sampling instant t, three intervals are defined, as the preinterval specify. The first condition guarantees that the field jump of a discontinuity is large enough to be distinguished from other irrelevant fluctuations, and the second is a supplementary condition to reduce the identification uncertainty. In this study, we stipulate c 1 = 4, c 2 = 0.5. We then scour the Cluster 1 data using this criterion with interval lengths T = 2, 6, 18 s, respectively, to identify the discontinuities at different spatial scales. In total, 21,978 events are identified. Figures 1(b)-(e) display the observational conditions of these 21,978 discontinuities, in which each dot represents an event. Figure 1(b) shows the average separation distances between the four spacecraft, labeled L SC , when discontinuities are detected. During the period we investigate, L SC ranges from ∼200 km to ∼12,000 km.
A discontinuity suitable for timing should be observed successively by four spacecraft. Figure 1(a) displays a discontinuity detected on 2003 February 10. Sometimes there is a discrepancy between the time profiles recorded by different spacecraft crossing a discontinuity. Such a discrepancy can appear in the form of magnitude and direction, i.e., the field change vectors have different amplitudes and orientations, or in the form of substructures (e.g., see the black and green curves in Figure 1(a)). That implies a breakdown of the planar and uniform assumption for a discontinuity. To ensure reliable timing results, such cases should be avoided. Thus, we check these events in the following way. We first apply MVAB to Cluster 1 data to find the magnetic field component (labeled B L ) with the largest variation. Then the discontinuity observed by Cluster 1 is precisely located by the maximum gradient of B L , as shown by the vertical black line in Figure 1(a), and the corresponding transition interval (see the pink shaded area) is defined as a time window centered at the discontinuity and 1.5 times wider than the region where max Cluster 1 data in this transition interval are labeled where t 1 and K are the onset time and the number of data points. We then bin the B L data series from the other three spacecraft by a boxcar of the same length K, which steps with one-sample increments, and label the obtained data segments B t B k k K 1 ,..., etc. We then minimize  will make the solution unstable.
Thus, the Cluster tetrahedron shape should also be taken into account. We use the volume ratio r V , i.e., the ratio of the volume of the Cluster tetrahedron to a regular tetrahedron with the same average side length, as a tetrahedron quality indicator. for each event. In this study, we demand r V > 0.5, which generally accompanies a matrix condition number smaller than 10. 7028 of 13,142 events meet this requirement. Finally, the normal magnetic fields of these discontinuities can be estimated according to n  . For simplicity, we label its time series within the transition interval B Ñ . In the ideal condition, B Ñ should be a constant to satisfy the divergencefree equation. Since a constant B Ñ is actually impossible on the measurements, we can quantify its fluctuations by B B N s á ñ ( ) | | , i.e., the standard deviation of B Ñ normalized by the background magnetic field magnitude. We adopt the four-spacecraft average of B B N s á ñ ( ) | | and require it to be smaller than 0.15. Only 276 events, less than 4% of the 7028 events retained, do not meet the requirement, indicating that a stable data set comprised of 6752 discontinuities and the corresponding timing results has been established after our cautious inspection and selection. Based on the data set, we are able to test the efficacy of MVAB in different conditions.

Timing Results
Compared with previous studies of IDs based on triangulation or timing (Horbury et al. 2001;Knetter et al. 2003Knetter et al. , 2004, our data set is more general and reliable, since it contains a lot more events and the application of timing has been cautiously checked. So it is worth a reexamination, in order to make a comparison with previous findings. A primary advantage of timing, compared to other techniques, is its direct determination of the structure speed. Figure 2(a) shows the distribution of discontinuity motion speed along the x-direction (i.e., pointing to the Sun) in the Geocentric Solar Ecliptic (GSE) coordinate system, in which the blue and orange histograms correspond to the events with r V > 0.5 and r V < 0.5, respectively. Due to supermagnetosonic solar winds, a positive V x relative to the spacecraft is illogical, because it requires the discontinuity to propagate in the plasma frame sunward with a supermagnetosonic speed larger than the solar windʼs convection speed, which is physically impossible. Nearly all the timing errors, i.e., V x > 0, occur with r V < 0.5, which demonstrates the necessity of checking spacecraft tetrahedron shapes. Figure 2(b) displays the probability density function (PDF) of the angular changes (i.e., the angle between the upstream and downstream magnetic field directions) of the 6752 IDs (the subgroup of the 7028 IDs in Figure 2(a) satisfying B B N s á ñ ( ) | | < 0.15). It is worth noting that the identification criterion B B 0.5 D > á ñ | | · | | has excluded some discontinuities with small field rotation angles. However, the discontinuities with ω larger than sin 0.5 1 -= 30°are bound to satisfy the criterion and will not be affected. Thus, the PDF is reliable at ω > 30°(see the vertical dashed line in Figure 2(b)). The observed PDF is well fitted by exp 24 . 1 w - ( ) , consistent with  Table 2 in Neugebauer 2006). The rotation angle of the in-plane field ω in-plane is another important geometric parameter of discontinuities. The PDFs of ω in-plane for different discontinuities are presented in Figure  2(d). The PDF of ω in-plane across EDs decays exponentially with the angle, while across RDs ω in-plane exhibits a relatively uniform distribution. A further discussion about these results is included in Section 4.

Dependence on λ 2 /λ 3
The error of MVAB is defined as the angle between two normals of a discontinuity determined by MVAB and timing, respectively. For simplicity, it is labeled θ MVA_Tim . As we have introduced, the ratio of the intermediate to the minimum eigenvalue λ 2 /λ 3 is an important indicator of the reliability of MVAB results. Thanks to the four Cluster spacecraft, the data set of 6752 discontinuities provides the equivalent of 27,008 test results of MVAB, which could be used to investigate the dependence of θ MVA_Tim on λ 2 /λ 3 . Figure 3(a) shows the PDFs of θ MVA_Tim in four intervals of λ 2 /λ 3 , i.e., [1, 3], [3, 10], [10,20], and [20, ∞]. The percentages in the figure represent the proportions of events with corresponding λ 2 /λ 3 . As is shown, the MVAB results with λ 2 /λ 3 <3 have no statistical significance, since θ MVA_Tim appears to be randomly distributed. As λ 2 /λ 3 enlarges, θ MVA_Tim decreases statistically and the proportion of θ MVA_Tim with a smaller value increases significantly. Specifically, η, defined as the proportion of events with θ MVA_Tim < 30°, equals 0.35, 0.45, 0.64, and 0.88 in such four intervals of λ 2 /λ 3 , respectively. These 27,008 results are then divided into subgroups according to λ 2 /λ 3 , and we calculate the median of θ MVA_Tim (labeled med _ MVA Tim q ( ) ) and η for the events in each bin of λ 2 /λ 3 . The results are shown in Figure 3(b). It can be seen that med _ MVA Tim q ( )and η manifest significant dependence on λ 2 /λ 3 , and that med _ MVA Tim q ( ) decreases to 30°at λ 2 /λ 3 = 8. We assume that an error of less than 30°may be acceptable in statistics, and then focus on the results with λ 2 /λ 3 >8 in the following analysis.

Dependence on Discontinuity Geometry
The θ MVA_Tim distributions shown in Figure 3(a) exhibit an anomalous feature: as θ MVA_Tim increases from 0°to 90°, the probability density decreases first, but then increases and forms a hump at 90°even when λ 2 /λ 3 is large. This implies that there are other factors besides λ 2 /λ 3 affecting the accuracy of MVAB significantly. We speculate that such a hump relates to a particular discontinuity geometry, and then examine the geometric properties of the discontinuities to verify the conjecture.  of MVAB. In contrast, Δ|B|/|B| and the angular change ω do not rely on the discontinuity normal, and can be known in advance. Figure 4(d) presents the relation between θ MVA_Tim and ω. Figures 4(c)-(d) demonstrate that the MVAB method can achieve acceptable accuracy when either |B|/|B| > 0.05 or ω > 60°.
It is also an interesting question why the EDs with and Δ|B|/|B| < 0.05 are likely to be misidentified in MVAB. To find out, we compare the results of timing and MVAB, especially for the events with θ MVA_Tim > 60°. In Figure 5(a), the θ MVA_Tim distribution shown in Figure 4(a) is replotted by black (for θ MVA_Tim < 60°) and blue (for θ MVA_Tim > 60°) dots, but as a function of B B N | | | | and ω in-plane determined by timing. For the blue dots, the MVAB technique cannot give reliable normal vector estimates, but leads to false B B N | | | | and ω in-plane . For comparison, these incorrect MVAB results are presented by the red dots. For clarity, the top view of Figure 5(a) is shown in Figure 5(b). Such a misidentification process in MVAB is obviously regular. The EDs with small B B N | | | | and ω in-plane (see the blue dots) are likely to be misidentified by MVAB as RDs with dominant normal magnetic fields and large ω in-plane (see the red dots). Such a process could be explained as follows.
Let us imagine a discontinuity with a vanishing normal magnetic field (i.e., B N = 0), a constant magnetic field magnitude (i.e., B B 1 2 = | | | |, in which B 1 and B 2 represent the magnetic field vectors on both sides), and a small field rotation angle ω. Obviously, both B 1 and B 2 lie in the discontinuity plane and the plane normal can be expressed as e , is the dominant component and remains the same on both sides, while the field vector changes only in the L direction. Although B M may vary during the field rotation from B 1 to B 2 , such a variation is tiny, due to the small ω. Under the circumstances, the MVAB technique will probably confuse e m and e n . Consequently, B M is misidentified as the normal magnetic field and B L , B N are regarded as the in-plane components in MVAB. Since B N = 0, ω in-plane is estimated to be 180°. That is just consistent with the results in Figure 5(b).

Dependence on Window Width
The time window of data to which we apply MVAB is generally selected to cover the entire transition of the magnetic field from its initial state to the final state. The MVAB results presented hereinbefore rely on the data series in the long window (LW), which consists of three intervals (the transition interval and the pre-and post-state intervals), as shown in Figure 6(a). The determination of the transition interval (the red shaded area) has been introduced in Section 2, while the preand post-state intervals (see the blue shaded area) are adjacent to and have the same width as the transition interval. To investigate the effect of the data window width, we reperform MVAB with the data series in the short window (SW), which is the same as the transition interval. Similarly, we set a threshold of 8 for λ 2 /λ 3 , and 13,948 MVAB results meet the requirement under the SW condition. In contrast, under the LW condition there are only 5383 results satisfying λ 2 /λ 3 > 8. Figure 6(b) shows the number distributions of θ MVA_Tim under the LW and SW conditions. Obviously, the SW condition makes more θ MVA_Tim concentrated at 0°. Now let us focus on the distributions near 0°and 90°. Under the LW condition, there are 70% and 20% of the MVAB results satisfying θ MVA_Tim < 30°and θ MVA_Tim > 60°, respectively, while the SW condition changes such two values to 74% and 10%, indicating more good results and less bad results. In addition, the anomalous hump at 90°nearly disappears under the SW condition, indicating that the systematic error due to the special discontinuity geometry is mitigated. Figure 6(c) displays the median θ MVA_Tim as a function λ 2 /λ 3 under such two conditions, in which the two vertical dashed lines indicate the widely used threshold 3 for λ 2 /λ 3 and the threshold 8 we suggest. As can be seen, the SW condition can reduce med _ MVA Tim q ( ) by ∼5°-10°in λ 2 /λ 3 ä [3,8]. Considering that smaller λ 2 /λ 3 occur much more frequently than larger  ones, the SW condition can significantly improve the MVAB performance.

Estimating ID Composition by MVAB
Since the MVAB accuracy closely relates to the discontinuity geometry, it is a reasonable conjecture that there is also a connection between the geometry and λ 2 /λ 3 . Thus, a threshold for λ 2 /λ 3 might influence the four types of discontinuities to varying degrees, and consequently bias the statistical results. We define a function of the lower limit of λ 2 /λ 3 (labeled L 2 3 l l ( ) for simplicity), called the pass rate, as the proportion of the discontinuities with λ 2 /λ 3 larger than L 2 3 l l ( ) in the total discontinuities. The pass rates of the four types of discontinuities are shown in Figure 7(a). Obviously, the TDs and EDs are more sensitive to L 2 3 l l ( ) . A threshold of 8 for λ 2 /λ 3 will cause 50% of the TDs and EDs to be discarded, while only 15% of the RDs and NDs are affected, indicating that L 2 3 l l ( ) can change the estimated type ratio of the discontinuities.
Let us set L 2 3 l l ( ) = 8 for the moment and calculate the discontinuity type ratio based on the MVAB results. Such a ratio equaling RD: TD: ED: ND = 34.4%: 4.9%: 51.1%: 9.6% is shown by the magenta histogram in Figure 7(b). Since MVAB is likely to mistake some EDs for RDs, we calculate the type ratio under the limitation L 2 3 l l ( ) = 8 again by using the timing results. Such a ratio is displayed by the orange histogram, as RD: TD: ED: ND = 8.8%: 5.3%: 76.7%: 9.2%. By comparison, we find that about one-third of the EDs are misidentified as RDs. However, that ratio does not accord with the true property of IDs either, since it is biased by L 2 3 l l ( ) . The blue histogram presents the actual type ratio RD: TD: ED: ND = 5.4%: 5.6%: 83.6%: 5.4%, which unfortunately cannot be estimated by the MVAB technique.
A frustrating fact found here is that it seems impossible to obtain the true statistical properties of ID by MVAB, though the method may achieve satisfactory accuracy in case studies. The bias in statistics is generated in two processes. The first relates to the MVAB errors and misidentification in case studies, which can be overcome by elevating the threshold for λ 2 /λ 3 . However, the requirement for λ 2 /λ 3 artificially adopted is equivalent to an inhomogeneous selection of the discontinuity population, i.e., different types of discontinuities have different pass rates, as we show in Figure 7, making the statistical results distorted. New methods are necessary to overcome this problem.

Summary and Discussion
The MVAB method, for its numerical simplicity and loose requirements for data, has become the most widely used technique in diagnosing planar magnetic field structures in space (Mazelle et al. 1997;Söding et al. 2001;Cao et al. 2013). However, the error of the MVAB technique is poorly understood, since there are too many relevant factors. In contrast, the timing method, though applicable only to fourspacecraft data, can produce a result with errors that are more controllable. In this study, we collect 21,978 IDs from the Cluster date in the solar wind, during 2002 July 1-2008 December 31. Of these events, 6752 can be analyzed by timing with satisfactory precision.
Based on the timing technique, the types of these discontinuities are determined and yield a ratio of RD: TD: ED: ND = 5.4%: 5.6%: 83.6%: 5.4%, which is quite different from previous findings relying either on MVAB or timing. Previous studies based on MVAB generally indicate a dominance of RDs in the solar wind (Horbury et al. 2001;Söding et al. 2001;Wang et al. 2013;Liu et al. 2021Liu et al. , 2022a, while other studies using timing/triangulation claim that there are few (even zero) real RDs, but a large number of EDs (Horbury et al. 2001;Knetter et al. 2003Knetter et al. , 2004. This study disproves the dominance of RDs and reveals why EDs are misidentified by MVAB as RDs-it relates to a special geometry of these EDs. Moreover, the results of Knetter et al. (2004), based on timing, which claim that RDs do not exist in space, may not be true either. Such an underestimated proportion of RDs may result from their identification method (using 1 minute time-resolution data that will miss the Figure 7. (a) The pass rates, i.e., the proportion of the discontinuities with λ 2 /λ 3 larger than the lower limit The type ratios of the discontinuities. In panel (b), the blue and orange histograms are obtained by timing, taking into account all the discontinuities or only the discontinuities with λ 2 /λ 3 > 8; the magenta histogram is based on the MVAB results for the discontinuities with λ 2 /λ 3 > 8. discontinuities at small spatial scales) or appear by accident, as the data set only contains 129 discontinuities. Liu et al. (2022a), utilizing the Magnetospheric Multiscale Mission data and MVAB technique, reported a ratio of RD: TD: ED: ND = 68%: 5%: 20%: 7% in the solar wind. We noted that the sum of the RD and ED proportions (i.e., 88%), which is independent of B B N | | | | and thus not affected by the analysis techniques, is quite close to that in this study (i.e., 89%), implying that the confusion between RDs and EDs is the major defect of MVAB.
We also perform MVAB on our data set and make a comparison of the timing and MVAB results. Since the normal vector is the most important parameter of a discontinuity, we statistically analyze the angles between normals estimated by MVAB and timing. The eigenvalue ratio λ 2 /λ 3 is the primary factor relevant to the MVAB accuracy. The median angular error decreases as λ 2 /λ 3 and will be less than 30°if λ 2 /λ 3 exceeds 8. The discontinuity geometry also influences the MVAB performance significantly. The EDs with small B B N | | | | and ω in-plane (the angular change of the in-plane magnetic field) are likely to be misidentified by MVAB as RDs with dominant normal magnetic fields and large ω in-plane , and the reason has been revealed. This also explains the false RD predominance observed in previous studies based on MVAB. We further investigate the dependence of the MVAB accuracy on the width of the data window to which MVAB is applied. An SW as defined above is quite favorable for improving the stability of MVAB and reducing the angular error. We also test the data filteringʼs impact (not shown) and find that it hardly changes the MVAB performance.