On general correlation between 3D solar wind speed and density model and solar proxies

The solar wind (SW) is a supersonic outflow of plasma from the solar corona, with the latitudinal speed and density profiles varying with the solar activity. The SW protons charge exchange with the inflowing interstellar neutral atoms and create energetic neutral atoms (ENAs), which bring information on the physical state of the plasma within the boundary region of the heliosphere. The speed of the ENAs depends on their energies, and consequently observations at different energies provide information on different epochs backwards in time. Therefore, understanding the history of the evolution of the SW is important to understand this information. In this paper, we extend the work by \citet{porowski_etal:22a}, who provided the WawHelioIon 3DSW model of the time evolution of latitudinal profiles of the SW speed and density based on results of analysis of interplanetary scintillations (IPS). Based on results of Principal Component Analysis, we seek for correlation between selected solar proxies and the structure of the SW obtained from IPS and show that it is possible to reproduce the evolution of the SW structure during the past three solar cycles based on the proxies. With this, we extend the history of the evolution of the SW structure back to 1976, i.e., to the epoch when observations of the key proxies -- the inclination of the SW current sheet and the solar polar magnetic fields -- became available. We point out the potential of the use of the proxies for forecasting the structure of the SW into the future.


INTRODUCTION
The heliosphere is a bubble in the interstellar medium blown by the solar wind (SW). Information on processes operating in the boundary region between the SW and the interstellar gas can be obtained at 1 au from observations of energetic neutral atoms (ENAs) of hydrogen that are created by charge exchange between interstellar neutral atoms and PUI protons carried by the SW (Gruntman 1997, Gruntman et al. 2001. The latitudinal structure of the SW varies during the cycle of solar activity, with the velocity and density profiles being relatively flat during the activity maxima and highly structured during the minima , Lallement et al. 2010, McComas et al. 2008, Tokumaru et al. 2010. The intensity of the production of hydrogen ENAs depends on the flux of interstellar PUIs, which depends on the flux and speed of the SW, how PUIs are accelerated at the heliospheric termination shock and in the inner heliosheath (e.g., Zank et al. 1996, Chalov et al. 2003, Yang et al. 2015, Fahr et al. 2016, Kumar et al. 2018, Mostafavi et al. 2019, Giacalone et al. 2021, Zirnstein et al. 2021, and the physical processes governing the origin of the IBEX ribbon (e.g., McComas et al. 2009, Heerikhuisen et al. 2010, Schwadron & McComas 2010, Zirnstein et al. 2015, Florinski et al. 2016, Dayeh et al. 2019, Mousavi et al. 2022. The fluxes of ENAs observed at 1 au are additionally modulated by re-ionization losses due to charge-exchange with SW protons and photoionization inside the termination shock of the SW (Bzowski 2008).
The charge-exchange interaction of interstellar neutral hydrogen with the structured SW inside the heliosphere results in variations of the density and flux of interstellar hydrogen within a few au from the Sun (Bzowski et al. 2002, Sokół et al. 2019b, where most of the heliospheric backscatter glow observed at 1 au is formed (Ruciński & Bzowski 1995). Especially in the downwind hemisphere, including the region near the ecliptic plane, the latitudinal variation of the SW structure affects the density of interstellar neutral H and the helioglow intensity (Bzowski 2003). These variations are also visible in direct-sampling observations of interstellar neutral H atoms performed at 1 au by IBEX-Lo experiment onboard the Interstellar Boundary Explorer (IBEX) mission (Galli et al. 2019. The variations in the density of interstellar neutral H affect the production rate of the PUIs. ENAs observed at 1 au at different energies bring information on the state of the plasma in the heliosheath at different epochs (Zirnstein et al. 2016, Reisenfeld et al. 2019) because of the energy-dependent travel times of ENAs from their creation sites to the detector. Also the primary and secondary populations of interstellar neutral H feature long travel times, equal to several solar cycle periods (Bzowski & Kubiak 2020). With the advent of fully time-dependent models of the heliosphere, able to use models of time evolution of the solar factors (e.g., Izmodenov & Alexashov 2015), modeling time variations of the ENA productions becomes feasible.
Therefore, the evolution of the SW structure is important for interpretation of observations of heliospheric ENAs, interstellar neutral atoms, and the helioglow, in addition to the interest in the SW itself. Pioneering studies of the latitudinal structure of the SW were performed in situ by Ulysses during its three orbits in a solar polar plane. The SW speed and density profiles are available for one solar maximum and two solar minima (Wenzel et al. 1989, Bame et al. 1992, Gloeckler et al. 1992, McComas et al. 2008. Especially valuable are those performed at the perihelion portions of the orbit, the so-called Fast Latitude Scans, where the full range of heliolatitudes was scanned within only several months. The measurements performed during the remaining portions of the Ulysses orbits are more challenging to interpret because the duration of the heliolatitude scan was comparable to the time scale of changes in the solar activity during the solar activity cycle. In practice, systematic monitoring of the SW structure is only possible using remote-sensing methods based on observations performed from the ecliptic plane. One of these is the interplanetary scintillation (IPS) method. It relies on multiple-station observations of scintillation of radio waves from compact radio sources, caused by density fluctuations of SW electrons (Hewish et al. 1964, Manoharan 1993, Kojima & Kakinuma 1990. The computer-aided tomography of IPS observations enable reliable determination of the solar wind structure inside 1 au (Jackson et al. 1997, Jackson et al. 1998, Tokumaru et al. 2010, Tokumaru et al. 2021. The SW speed is determined by observing IPS for a radio source using geographicallyseparated antennas. Based on observations of multiple sources performed during the period of solar rotation, a synoptic map of the solar wind speed as a function of heliolatitude and heliolongitude is produced. The SW speeds determined using the IPS method involve the effect of weighted integration along the line of sight. The tomographic analysis, in which correlation between the SW speed and the strength of density fluctuations is assumed, enables correcting for this effect. These monthly maps, released once per year, are publicly available from the Institute for Space-Earth Environmental Research in Nagoya, Japan (Tokumaru et al. 2021).
Inevitably, because of the geometry of the observations, the speeds obtained for the highest latitudes are the most uncertain (see, e.g., Figure 7 in Sokół et al. 2013). Also, the coverage of the sky is very non-uniform because of the non-uniform distribution of suitable radio sources and the location of the antennas in the northern hemisphere of the Earth. Additionally, there are seasonal gaps in the sky coverage because of antenna maintenance during winter months. Individual data points in the synoptic maps feature a large scatter and some of them are outliers. However, heliospheric studies need a continuous and regular coverage, free from outlying elements.
To address the postulate of a continuous and regular coverage, Sokół et al. (2013) and Bzowski et al. (2013) decided to sacrifice the longitudinal and a portion of the temporal resolution of the IPS SW speeds for the sake of regularity. They averaged the Carrington synoptic speed maps over heliolongitudes during individual calendar years, obtaining yearly heliolatitudinal speed profiles. Subsequently, they fitted parameters of an analytic model defined as first-and second-order smoothly-connected splines and obtained an analytic model of the yearly speed profiles, which can be linearly interpolated between individual years in time. This approach was subsequently continued by Sokół et al. (2019a) and Sokół et al. (2020).
During the processing of the Carrington maps, Sokół et al. (2013) observed that the magnitude of the speed was correlated with the number of line of sight observations used to compile the map. Consequently, they decided to discard the monthly maps developed based on the number of individual line of sight observations below a certain threshold.
The presence of potential outliers in the Carrington maps of the SW speed and limitations of the analytic model used by Sokół et al. (2013) and Sokół et al. (2020) prompted Porowski et al. (2022) to revisit the approach to the approximation of the SW structure. These authors developed a novel statistical method of identifying and rejecting outlying data in the Carrington maps. They cleaned the yearly latitudinal profiles of the SW speed and approximated them with a sum of the Legendre polynomials up to an order varying between the years, with an additional constraint of disappearing of the derivatives of the profiles over heliolatitude at the solar poles.
Results of application of this scheme to reprocessed SW tomography data for the entire observation period 1987-2020, presented by Tokumaru et al. (2021), who modified the relation between the scintillation intensity and the SW speed, were collected in the WawHelioIon 3DSW model of the SW speed, almost free from outliers. The mathematical form of the yearly solar wind latitudinal speed profiles in the WawHelioIon 3DSW model is defined in Equations 1-2 in Porowski et al. (2022). A yearly profile is represented by a sum of the Legendre polynomials up to a certain order n, with the coefficients numbered from 0 to n. The model features an additional constraint of null derivative over heliolatitude at the poles, which reduces the number of independent parameters of the model to n − 1. The parameters of the model make a time series covering the time interval 1985-2020.
In the present paper, we continue the development of the WawHelioIon 3DSW model. We extend the measurement data base with observations from 2021. Because the time series created from the SW model coefficients covers almost four solar cycles, we expect that it is long enough to comprehend information on the temporal evolution of the SW on time scales longer than one solar activity cycle. We look for correlations between the shapes of the solar wind speed profiles, and demonstrate that a carefully selected set of proxies is able to reproduce the observed profiles of the solar wind speed.
The objective of the paper is to find a general correlation between solar proxies and the SW structure that will provide a capability to calculate the mean yearly SW speed profile based on a selection of proxies. We examine the properties of the proxy time series to assess its applicability in expanding the SW evolution model in time (backward and forward).
The analysis starts with a mathematical formulation of the generalized SW model in Section 2. As a preparation for implementation of this model, in Section 3 we adapt the SW model developed by Porowski et al. (2022) used as the input SW model, and in Section 3.2 we reduce its dimensionality using the Principal Components Analysis (PCA) method. Subsequently, in Section 4, we take proxies that potentially can be used for reproduction of the SW speed profiles and, applying again the PCA analysis, we select a subset of proxies that is actually used to construct the generalized SW model. Next, in Section 5, we present the extended SW model, which we refer as the Generalized SW model based on selected proxies, and show its example application. In the Discussion (Section 6), we show the performance of the proxy filtering and provide some comments on the model properties when applied to individual solar cycles. We complete this section with a comparison of the proxy-based model with in situ SW speed observations from Ulysses (Section 6.3). The results of the model development are presented in Section 5, where all details needed to implement the model are given. The paper is completed with a summary and conclusions in Section 7.

MATHEMATICAL FRAMEWORK
The SW model recently published by Porowski et al. (2022) provides a functional description of 35 yearly mean latitudinal profiles of the SW speed for the years 1985-2020. The profiles are obtained as a linear combinations of bases built from Legendre functions (the model base) and one of m = 35 individual vectors of parameters (the parameter vectors). Each of the parameter vectors is individually assigned to one of the 35 years in the range 1985-2020, without year 2010. The parameter vectors are obtained from separate fits of the base functions to IPS data. The parameter vectors, together with the base functions, provide a precise estimation of a yearly-averaged latitudinal SW speed profile for the middle of a given year.
In this work, we adopt a collection of solar proxies to see whether it is possible to express the SW model parameters through the proxies. Both the collection of solar proxies and the SW model parameters are treated as two potentially correlated multidimensional time series, in which we eliminate temporal dependence between them using a multilinear approach to express one by another. The SW model parameters used in this work are obtained from fits of the Legendre-base SW model to the IPS data that have been previously processed using the methodology developed by Porowski et al. (2022), with an extension to the data covering year 2021, which became available after completion of the analysis performed by these authors. The choice of proxies for the final model is based on an analysis that we discuss later in Sec. 4.
In the multilinear approach, it is desirable to provide uncorrelated inputs. Because both the commonly known proxies and the SW model parameters are usually highly correlated, we decompose them into uncorrelated components using PCA. Mathematical details of PCA are available in many textbooks, including, e.g., Jolliffe (2004) and Shlens (2014). As a result of the application of PCA, the model parameters are not only decomposed into uncorrelated components, but also potential multilinearities and correlations between the model parameters are removed. Additionally, the PCA has also a feature of model dimensionality reduction, which is also used in our analysis.
The multilinear formulation we used may be expressed as follows. Denoting the i-th Principal Component (PC) of the input SW model parameter as PC SW model i and the vector of proxy PC as PC proxy , we assume a general relation between them in the form: where c i is the i-th constant parameter, M i j is an element of an m × n matrix of coefficients M, which is describing a linear combination of the SW model parameters and the proxies, n is the number of proxies, PC proxy j is the j-th element of the proxy PC vector. In the application of this formula to the PCs of the parameters of the reduced model, the index i ranges from 0 to m. Our objective is to determine the matrix M by means of parameter fitting for all years for the optimum values of n and m. The values of n and m are found empirically.
In the formulation shown above, the matrix M and the vector c may be regarded as a generalization of the SW speed model, since the information included in matrix M represents a generalized response of the model to a given proxy PC vector. Even though M is retrieved using a proxy time series for a limited number of years, the time is not explicit in this formulation. So, under assumptions that (1) the general dependence between the proxies and SW profiles deduced from the fitting is a representative description of the dependence between the proxies and SW speed profile in general, i.e., for all times, and (2) that the transformation matrices used to calculate the PCs of the proxies and the PCs of the parameters also are representative, the matrix M may be regarded as a generalized estimator of the SW profiles outside the time range covered by the available time series of the available IPS observations, which returns SW speed profiles when a proxy vector is multiplied by this matrix.
The matrix M and the vector c are obtained by fitting to the data for the years 1985-2021, after necessary rearrangements. Since the components of PC SW model and PC proxy are uncorrelated, the fitting procedure that minimizes the expression in the time domain is performed separately for each component of PC SW model . It means that each of the m components, (PC SW model i ) is fitted individually and expressed as a linear combination of the components of PC proxy . We used a Wolfram Mathematica function NonlinearModelFit with automatic selection of minimization method.
Similarly as in the case of the transformation matrix from the Legendre base to that obtained from the PCA analysis for the SW, the robustness of M will be verified during the coming years, when more SW IPS data will be available. In our opinion, it is desirable to use the data from a time interval covering full solar cycles. At present, we have covered three full cycles and a small portion of the fourth one. We decided to not exclude the most recent years from the analysis to acknowledge for the presence of long-term variations in the solar wind, with a time scale much longer than that of the 22-year Hale cycle.
Using the above formulation, we performed an empirical study of the generalized SW model, which allowed us to determine the optimum model setup, which provides the best SW model estimations. The optimum setup that we developed brings stable and unique solutions of the SW profile estimations with a reasonable accuracy and satisfactory agreement with the Ulysses data. The setup determines the magnitudes of m and n, i.e., the numbers of m model of the components of PC SW model and the n proxy components of PC proxy , and the composition of the proxy vector. The quality of the solutions provided by the optimum model setup was checked by comparing the SW profile estimations provided by the generalized model for a wide variety of possible m model and n proxy values, as well as for different proxy vector compositions. More details will be presented below.  Porowski et al. (2022). In order to apply the formula from Equation 1, the input model is refitted to obtain a common order for all the years. The order of the input SW model varies from one year to another, resulting in a variable number of model parameters (i.e., the parameter vectors lengths) hanging from one year to another within the range between 9 and 17. The variable order of the parameters was previously used to obtain a distribution of the residuals as close to the Gaussian as possible. It was motivated by the goal that the model is supposed to provide an optimal and precise SW speed profile, treating each year individually. However, a constant dimension of the input model is required in the formula in Equation 1. Therefore, the analysis starts with adoption of a common order of the SW input model, which requires refitting of the model. The use of a common order of the input SW model also allows us to apply the PCA in the analysis. The refitting includes also an extension of the IPS data by another year (2021), for which the IPS data are now available. This increases the number of years with data available to 36. The preprocessing of the IPS data for this additional year was done identically as for all the other years (Porowski et al. 2022).
The first issue to address was selection of the common initial model order m init . We found that the generalized SW model provides almost identical estimations of the SW profiles for a wide range of the m init values 8-22 when a filtration level at 1% in the PC domain is adopted. We selected m init = 20, which was motivated by the available length of the proxy vector limited by data and filtering. From this, we obtained 19 independent SW model parameters which are fitted to each of the 36 years individually. Each parameter corresponds to a given model order (and so to a given element of the Legendre base), so the fitting provides 19 vectors with 36 vlues of model parameters, which can be regarded as 19 sets of the time series.
Next, we analyze the time series composed of the refitted coefficients. The time series of the fit parameters of the SW model are shown in Figure 1.
Visual inspection of the time series of the model parameters reveals a quasi-periodic behavior in some of them. In particular, the zeroth, second, and fourth parameters clearly show a periodicity corresponding to the solar cycle variations. While the zeroth parameter (i.e., a constant value) corresponds to the latitude-averaged SW speed, the physical and temporal interpretation of the other parameters is more complex. Therefore, to investigate their properties, we analyze the temporal variations of the parameters using the autocorrelation function (ACF), calculated for time lags ranging from 1 to 30. The results are plotted in Figure 2 with the confidence bands for the confidence level cl = 75%.
Analysis of the ACF of the SW parameters confirms the existence of seasonality in the first three even-numbered model parameters, i.e., for #0, #2, and #4, which describe the latitudinally-symmetric variation of the speed profiles and closely follow the 11 years solar cycle period. The remaining parameters feature periodicities close to 11 years (like #6) or different than 11 years, being in general agreement with the SW periods reported by, e.g., Prabhakaran Nayar (2006) and Katsavarias et al. (2012). While the parameters #1 and #3 show the strongest periodicity for ∼ 16 years lag, a ∼ 5.5 years periodicity seems to be featured in the ACF of parameter #5. The coefficient #7 features a strong periodicity of ∼ 12.5 years. In general, the oddnumbered parameters, which describe the asymmetric part of the model, show different properties than the even-numbered ones. No straightforward solar cycle-related seasonality is visible, but since their ACFs show that the statistically significant lags are present and often close to the periods existing in the power spectrum of the SW speed, it is apparent that traces of a complex periodic behavior of the SW in the odd-numbered parameters exist, and that the SW model parameters are related to the physical parameters of the Sun. However, in this work we will not try to convey how the parameters correspond to the physical properties of the Sun, leaving it to be investigated in the future. It is also not excluded that at the moment the character of the temporal structures for the odd-numbered parameters may be not fully descriptive because of the limited statistics and because during the investigated time interval the SW featured secular (or very long-period) variations, including a drop in the mean flux (see, e.g., McComas et al. 2013, Sokół et al. 2021.

Reduction of the dimensionality of the input SW model
The common seasonality, seen in the ACF for some of the SW model parameters in Figure 2, strongly suggests that some of these parameters may be redundant. Therefore, as the next step of the analysis, we decompose the parameters into uncorrelated components using PCA and filter out the least significant of the resulting components from further steps, i.e., we perform filtering in the PC space. A bar plot showing the contributions of the PCs of the SW model parameters to the total variation, ordered by their significance, is shown in Figure 3. The time series of the SW model parameter PCs are shown in Figure 4, and a new base of the model, being a transformed Legendre bases into the PCA space, is shown in Fig. 5.
At this stage of research we feel unable to conclude if the transformation from the Legendre base to the PCA-optimized base that we have just derived has an universal character, or if acquiring more IPS observations, expected during the forthcoming yeas, will significantly modify the PCA-derived basis. We anticipate that at least a few first base functions will not change significantly, because we verified that for different model dimensionalities n init from the range 8-22 the first few bases are very similar. The parameters of the SW model in the PCA base are denoted PC SW model . They may be regarded as the coefficients of the corresponding elements of the new base.
The elements of the new base are weighted by the corresponding PCs. For example, it is seen that the first PC (PC0), which contributes to the total variation by more than 60%, shows a seasonality that follows the heliolatitude-averaged SW speed, which modulates the corresponding V-like shaped element of the base (see the upper-left panel in Figure 5). Consequently, PC0 modulates the SW profile into the characteristic V-shape during the minima of the solar activity. The other components have a % of total variation 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17   lower weight and contribute to details of the shapes of the yearly profiles. While PC1 modulates mostly the polar regions, PC2 and PC3 describe the north/south (N/S) asymmetry.
We determine the optimal order of the SW model assuming a certain cutoff level of the PC contribution to the total variance, below which the PCs are considered as insignificant and rejected. Based on empirical study we found that the optimum cutoff value for the PCs of the coefficients is 1% of the contribution to the total variance. This reduces the number of the SW model parameters from 19 to 9. The model with the reduced common dimensionality will be referred to as the reduced SW model. The reduced SW model is used in fitting the proxy-based model. A comparison of the solar wind data, the original model from Porowski et al. (2022Porowski et al. ( ) (extended to 2021, and the reduced model are presented in Figures 6 and 7. Now we check how elimination of the insignificant PCs impacts the model SW profiles by analyzing the reduced-model residuals as a function of heliographic co-latitude φ (i.e., φ = 0 for the north pole and φ = π for the south pole). The residuals are arranged in 40 equi-areal latitudinal bins that correspond to bands on the solar sphere parallel to the solar equator. Selection of such bins reduces the impact of the inhomogeneous IPS data coverage near the polar regions, where the sparsity of the data and the relatively large spread of those that are available might lead to an overestimation of the residual dispersion. It was found that the profiles of the reduced SW model do not differ significantly from those in the refitted SW model, and that the reduction of the model dimensionality does not bias the results. This is illustrated in Figure 8. It is seen that due to the PCA filtering the model accuracy is changed insignificantly.
Concluding this part of our research, we have re-formulated the refitted SW model obtained by Porowski et al. (2022) and extended with data from 2021 into a reduced model, where all the yearly profiles are represented by the same number of 9 terms. The model is defined as a linear combination of the base functions obtained from the PCA analysis. This model can be used directly and supersedes that proposed by Porowski et al. (2022). In the further part of the paper, we focus on developing an approximation of this model using solar proxies.

PROXY SELECTION AND PROCESSING
Most of the solar proxy time series provide convenient measure of the solar activity evolution and show temporal variations that follow directly the solar cycle seasonality. On the other hand, the ACFs of the SW model parameters suggest that the parameters are characterized by a few independent types of variations. The fact that the SW parameters are deduced from high-resolution latitudinal profiles of the SW speed measured over many years suggests that other types of variabilities are of the solar origin, since many aspects of the solar activity must have left their imprints on the SW profiles. Therefore, we examine if the proxies also consist of components featuring different types of the solar seasonality, similar to those apparent in the SW model parameters. If so, they must be connected with the main variability and are less apparent in the proxy data because of their possible weak relative  amplitude. Therefore, we expect that if we are able to separate other potential types of the intrinsic solar variations imprinted in the solar proxies, we will be able to link the proxies with the SW model parameters. Our objective in this subsection is to identify and disentangle the expected traces of variations in the proxies that are different than those related to the 11-year solar activity cycle, and to use them in extension of the SW model.
The proxies that may be potentially adopted for the model extension must fulfill the following prerequisites. The number of independent proxies must be similar to that of the base functions in the reduced SW model. At least some of the proxies must be correlated with the 11-year variation of the solar activity cycle. The proxy set must be able to reproduce the N/S asymmetry in the SW. The proxies must have a long time coverage, ideally extending before the beginning of the available IPS solar wind data , and be available nowadays. At least some of them must be connected with phenomena related to the solar magnetism, which is one of the factors responsible for the creation of the solar wind.
On the other hand, it does not seem to be advantageous to admit several proxies very strongly correlated with each other, like, e.g., the F30 (Tanaka & Kakinuma 1957, Shimojo et al. 2017) and F10.7 (Tapping 2013) solar radio flux, because this would result in yielding little additional information to the system but would certainly add the inevitable noise to the system. Fujiki et al. (2015) pointed out a usefulness of polar fields as proxies related to the solar wind structure. We identified a proxy set that meets the requirements listed above and provides a satisfactory estimation of the SW profiles. The proxy set consists of the components listed in Table 1, which also include proxies that had been considered as candidates during the tuning of the generalized model, but finally were rejected as a result of tests performed. Rejection of some of the candidate proxies was mostly because of their redundancy or negligible impact on the results. We found that the current sheet tilt angles in the north and south (CS N/S ) are  Tokumaru et al. (2021). The time series of the selected proxy set is shown in Figure 9.  . Comparison of reduced χ 2 for the initial and reduced SW models as a function of cos(φ). The reduced SW model is built on the first 9 PCs out of the 19 components obtained from the refitted SW model. Since only slight differences between the refitted and the original models are seen, it suggests that the order of the initial model is too high and that the model parameters are redundant.
Note that some of the selected proxies are correlated with the magnitude of the solar activity, and some of them with the strength of the solar magnetic field. While some of them are quantities averaged over the whole solar disk, some other ones are provided with discrimination between the north and south solar hemispheres. Additionally, proxies measured in situ close to the solar equator are also used. These properties fulfill the requirements stated above.
The proxies were yearly-averaged to provide a temporal resolution identical to that of the IPS-based model, and standardized before use (i.e., the mean values were subtracted and the results divided by the respective standard deviations). Before averaging, the input proxy data at their full resolution were filtered against values derived from other proxies to remove potential artificial correlation enhancements.
Before application of the candidate proxies to the SW modeling, we attempted to identify and disentangle the common dominating variations existing in the proxy set into uncorrelated components and to eliminate potential redundancies that still might be present in the selected proxies. We processed the variations similarly as in the case of SW model parameters, i.e., we applied PCA to the full set of the selected proxies for the years 1985-2021. The ACF of the proxy PCA showed separation of the commonly known solar cycle periodicities in the first three PCs (Figure 10). In addition, the higher orders of the proxy PCs also showed evidence of weak seasonalities, but at the boundary of the adopted confidence level, also showing a moving average-like type of periodicity in the data set. Subsequently, we subjected the results of the proxy PCA to a dimension-reduction analysis, similar to that performed for the coefficients of the refitted SW model. However, the sensitivity of the predicted profiles provided by the generalization procedure to the proxy PC filtering is high. Therefore, the optimization of the proxy PC filtration level was performed with respect to the total accuracy of the generalization procedure, and simultaneously to the stability of the profile predictions provided by the generalization. We found that use of 6 of 9 proxy PCs in the fitting performed within the generalization protocol is optimal. The methodology of the optimization is discussed in section 6.1.

The extended SW model
Based on the development presented in the preceding sections, we propose an extended model of the evolution of the latitudinal structure of the SW, which is a base for the most recent version of the WawHelioIon 3DSW model. The extension is purely empirical, based on the model by Porowski et al. (2022), and has a yearly resolution in time and an infinite resolution in heliolatitude. Since the extension relies on fitting to all years at once, it comprises information about long-term evolution of the SW and offers an ability to predict SW speed profiles for times outside the IPS data interval. For the years for which IPS measurements are available, i.e., 1985-2021 (except 2010), it provides SW speed profiles in agreement with IPS observations, as shown with the red line in Figures 6 and 7. For the years outside this set, for which all necessary proxies are available, i.e., for the years 1976-1984, it provides proxy-based estimates of the 3D structure of the SW under assumption that no abrupt changes in the mean solar physical properties occurred during the year. The obtained proxy-based predictions are stable and show the SW speed evolution as expected qualitatively based on the phase of the solar activity cycle. The overall quality of the prediction ability of the proxy-based backward extension is confirmed by a comparison of the model predictions with the Ulysses in situ measurements, as shown in the Figure 15.
The model extension is provided as a set of matrices and vectors. The set of matrices, according to Equation 1, includes the main matrix (M) and the vector of constants (c), shown in Equations 2 and 3, respectively. Additionally, a set of matrices and vectors necessary for preprocessing of the proxy data is provided (the matrix of proxy data must have the units and order as in Table 1). This set which includes: (1) the normalization factors of the PCA (norm proxy ) and (2) matrix to calculate the proxy transformation to the frame of the PCA (M pca ), shown, respectively, in Equations 4 and 5. As an example, a proxy matrix used in fitting (x) is shown below in Equation 6. The basis in the PCA frame and additional constants are in Equations 7 and 8, respectively.
Each row of x corresponds to a separate year and consists of yearly averages of individual proxies sorted in the ascending order (1985-2021, without 2010). The columns hold the time series of the individual proxies according to enumeration in Table 1. Before use, the columns of x must be standardized using vector basis const (i.e., the arithmetic mean value must be subtracted from the time series in each column, and the results divided by the standard deviations of the time series).
In order to estimate a profile for the desired time using the matrices above, one must execute the following steps. (1) Obtain the proxy vector, with the components set according to the order and units listed in Table 1. The mean time at which the proxy values were prepared will be the time for which the profile estimation is made.
(2) Standardize the proxy vector using the mean and standard deviations stored in vector norm proxy , in which the mean is the first column, and the standard deviation is the second column.
(3) After standardization of the proxy vector, perform its PCA by calculating the following product: proxyPC = −(proxyM T .proxy T ) T , and removing the last 3 values.
(4) Calculate the following product, which returns the coefficients of the SW model: coeffs = c + M.modelProxyInput, and join the basis const to coeffs. (5) The profile is ready for insertion of a value from the range of cos(φ) ∈< −1, 1 >. After calculating the following product: pro f ile(cos(φ)) = (coeffs.basis)(φ), one obtains a polynomial describing the mean SW profile.

Application
As an example of practical use of the derived correlation between the SW speed structure and the solar proxies coded in the form of matrix M, we estimate yearly average 3D SW speed structures for years the 1976-1984 and 2010. The above selection of years is made based on the availability of proxies, and also to fill the IPS gap in 2010. To obtain an estimation of the SW structure, we prepare yearly averages of the solar proxies, which we subsequently which normalize and calculate a product between the matrix M and the proxy vector, with an addition of the constant values that are also provided from the fit. The overall description of practical application is described in the previous section. The results of estimation are shown in Figure 11.
The temporal variations of the predicted profiles seem exactly as expected, based on the solar activity history during the prediction years. The SW speed profile evolves from the shape characteristic for the solar minimum, which was about to end around 1976. During the following years, the profiles smoothly flatten at the maximum of Cycle 21, which occurred around 1980, to begin to rise at the poles, heralding a transition to another solar minimum, which occurred after 1985. This temporal behavior of the predictions for the years 1976-1984, based on the experience obtained from analysis of the solar cycles for which IPS data are available, suggests that they are correct. Also the prediction for year 2010, i.e., the year during which one expects flattening of the SW speed profile due to transition to the solar maximum epoch, was indeed obtained in these calculations.
The calculation of the density is recommended to be performed identically as in (Porowski et al. 2022), i.e., using the invariant SW energy flux estimated from the OMNI2 in situ time series. Results of the speed and density calculation are presented in Figure 12.

DISCUSSION
The proxy-based model of the solar wind speed structure thus derived can be used to calculate yearly-averaged speed profiles of the solar wind for the times when IPS measurements of the SW speed are not available. Derivation calls for using yearly averages, but the time interval over which the averaging of the proxies will be performed may be selected according to specific needs. For example, the yearly profiles may be obtained not for the middle of a given calendar years, as we chose in this paper, but for any other part of a year. In particular, it can be centered at the halves of individual Carrington rotations. Thus, one might think of replacing the scheme used by Porowski et al. (2022), where the yearly-averaged SW parameter values obtained from IPS analysis are interpolated linearly for the centers of Carrington rotations, with a scheme where a speed profile for a given Carrington rotation is calculated from the proxies averaged over 13 Carrington rotations, straddling this selected one. This scheme cannot be expected to reproduce latitudinal profiles of the SW speed averaged over individual Carrington rotations. Analysis of longitude-averaged synoptic maps of the SW speed developed by CAT analysis (Tokumaru et al. 2021) suggests that changes of the SW speed profiles may be quite abrupt, occurring at time scale comparable to the Carrington period. Thus, while trends may be reproduced, most likely details will be missed.
An alternative approach could be calculating the speed profiles based on monthly averages of the proxies transformed using the PCA-derived matrix. A comparison of the calculated profiles with IPS-derived data is outside the scope of this paper and will be performed in the future.
Potentially, the method derived in this paper may be applied to forecasting the solar wind structure. This could be performed by calculating the SW speed profiles based on the most recent proxies transformed with the PCA matrix. This approach might be useful, e.g., for calculating the survival probabilities of energetic neutral atoms (Bzowski 2008 years: 1985-2021. It was shown that the dimension of the input SW model may be reduced from 19 to 9 when the PC analysis is applied. A similar dimension reduction procedure was also applied to the proxy PCs. It was motivated by an empirical study that showed that the least important proxy PCs do not improve the model but may contribute to propagation of uncertainties into the final solutions when projection of the model on years outside the IPS era is made, and also may lead to overfitting. Here, we show the tuning of the filtering procedure of proxy PCs, aimed to find the optimal filtering level, at which the least important proxy PCs are eliminated from the fit, but the remaining PCs preserve the salient information. The tuning is a part of the overall iterative tuning procedure and is performed on the entire dataset. The impact of elimination of proxy PCs on the general accuracy is investigated using the χ 2 measure. We determine the χ 2 magnitudes for different filtering levels of the proxy PCs used in the fits, starting the filtration from the less important proxy PC's (e.g., for the filtering level 3, the three least important proxy PC's are removed from the fit). It was found that the fit quality starts to deteriorate above the filtration level 3. We thus presume that the filtering level 3 is optimum. The results are summarized in the Figure 13.  Figure 13. The magnitudes of χ 2 of the SW speed profile reconstruction performed using the generalized SW model with respect to the IPS data for different levels of proxy PC's filtering. Application of filtering levels above 3 causes deterioration of the fit quality. The filtering levels are marked below the plot; 0 means no filtering.

Analysis of the residuals
The deterioration of the fit quality towards the poles, which is seen in Figures 8 and 13, is analyzed in this section. From our study we found that there are two most probable sources of the fit deterioration in the polar regions: worse IPS statistics in the polar regions or varying physical parameters of the solar wind. It is commonly known that the polar regions have lower IPS statistics, which results in a reduced coverage of these regions on Carrington speed maps. The use of equi-areal bins in heliolatitude during IPS binning before fitting reduces the impact of the lower IPS statistics on the general results, but the low statistics may still impact the fit quality in the case of fitting to all years at once. On the other hand, empirical data show distinctive changes in the physical properties of the SW between the solar cycles observed in the IPS era. To qualify which of the two potential sources is more significant for the fit deterioration in the polar regions, we perform the following analysis. We gathered the IPS data into three subsamples, which correspond to the full solar cycles during the IPS era. We thus had the subsamples for SC22, SC23, and SC24. For each of the three subsamples, we separately sought for correlation with the solar proxies (i.e., we determined three matrices M SC22 , M SC23 , M SC24 ), and subsequently we reproduced the SW profiles for the years corresponding to a given matrix to compare the χ 2 for all years with the χ 2 for each of the three subsamples. It turned out that the deterioration on the poles is significantly lower for the full solar cycle subsamples of the IPS data. To see if this is an effect of a lower statistics, we repeated the same operation and divided the entire sample into 11-year subsamples, but this time choosing the years randomly from the period 1985-2021, with each of the considered years selected only once. We repeated this latter selection and fitting process 20 times. In both cases, i.e., when selecting the fit samples by solar cycles and randomly, we obtained a lower magnitude of χ 2 , i.e., a better quality of the fit around the polar regions (see Figure 14). This suggests that at this temporal resolution, we are not able to quantify to what extent the lower fit quality towards the poles is linked with the physical changes of the SW or with the reduced statistics. It also indicates that the model based on the matrix M from the fit to all available IPS years should be adopted, i.e., no division of the IPS data into subsamples is recommended when a yearly-based approximation of the SW speed profile evolution is used in the fitting.
The last aspect of analysis of the residuals is the very low magnitude of reduced χ 2 in the equatorial region. Both, the initial and filtered models have χ 2 much below 1. It is a result of application of the OMNI corrections on the IPS data in the situation when the OMNI SW speeds are subsequently used as one of the proxies. A contributing factor may be an overestimation of the data uncertainties due to the fact that their spread in the individual bins is not distributed exactly normally (i.e., Gaussian-like). This latter conclusion is drawn from analysis of Figure 14, where the magnitude of reduced χ 2 is consistently below 1 for almost all data points.

Comparison with the Ulysses data
Now a comparison of model based on the matrix M its predictions with the Ulysses SW speeds measured in situ (McComas et al. 1998(McComas et al. , 2000(McComas et al. , 2002(McComas et al. , 2008 is made. For the comparison purposes we use the three fast Ulysses scans and the yearly mean proxies as input to the generalized SW model. The results are shown in Figure 15, in which a general agreement between the predictions and Ulysses data is seen. However, some minor differences occur, mostly at the areas with large dynamic changes of SW. The problems with reproduction of the high dynamics of the SW speed changes is an issue of the generalization procedure, which was pointed out in Section 6.2. This issue is due to the fact that the generalization procedure relies on yearly mean profiles, and and the fitting is done for all years at once. It causes that the generalization may sometimes have a large inertia, which results in a worsening of the accuracy of the predictions because of a slow adaptation of the model to the rapidly changing SW. Nevertheless, the differences in the first and second fast scan are mostly within the uncertainties of the Ulysses data. The third fast scan, which was performed in 2007, features some systematic differences in the southern hemisphere. In general, the solar wind in 2007 was quite unique, since the SW speed profile featured a large N/S asymmetry close to the equator, which is also confirmed by the IPS data (see Figure 7, where the asymmetry in the original SW reconstruction in the vicinity of the equator by Porowski et al. (2022) is also visible). Since such a situation does not appear frequently during the other years, the generalization procedure was not able to reproduce this asymmetry and provided a systematic difference for this specific year. Likely, this difference might be reduced if data for a longer time interval had been available or a larger temporal resolution would be used during the fitting performed within the generalization procedure, assuming that the unusual asymmetric of profile was not due to some other factors. Summarizing, the generalization procedure will always be insensitive to rare events of yearly SW speed profile shape distortions, and only a larger input data with numerous occurrences of the rare SW long-scale configurations will make this procedure sensitive to them. Despite these drawbacks, the profits offered by generalization and the fact that unusual profiles are rather rare, make it a useful tool for analyses of the SW speed.

SUMMARY AND CONCLUSIONS
We discovered an empirical correlation between the profiles of SW speed and a set of solar proxies, which allowed us to extend the model of evolution of the latitudinal structure of the SW developed by Porowski et al. (2022) into times for which the IPS measurements are not available. We called this extension a Generalized SW Model. We showed that the Generalized SW Model provides relatively stable solutions, even among different model setups. The solutions provided by the Generalized Model converge when PCA filtering is applied to the SW model parameters (with the 1% criterion) and to the proxies. The proxy PCA filtering was carefully optimized based on analysis of the convergence of the solutions. This holds for a wide range of m values (8-22), as well as for small variations of the proxy set composition. The overall accuracy of the generalization is confirmed by comparing its predictions with the Ulysses data, as shown in Figure 15, which demonstrates the fidelity of the model.
The model calculates the profiles of the SW speed for the middle of a calendar year. For the years for which IPS measurements are available, the profiles are derived based on the IPS data; for those for which they are not available, the proxy-based model is used. The evolution of the speed and density of the SW resulting from the model for the entire time interval for which the IPS or proxy data are available, i.e., for 1976-2022, are presented in Figure 12. The analysis steps leading to the final model are summarized as follows: 1. Take synoptic Carrington maps of the solar wind speed retrieved from CAT analysis performed by Tokumaru et al. (2021)on IPS observations. Perform IPS data preparation as presented by Porowski et al. (2022). Calculate Carrington-averaged profiles of the solar wind speed cleaned from outliers, averaged over uniform latitudinal bands in the space of cos(φ), where φ is heliolatitude. 2. Define the Legendre base in the cos φ space with the constraint of disappearing the derivatives over φ at the poles, as defined in Equations 1, 2 in Porowski et al. (2022), for the order n = 20 and fit the parameters of the Legendre model for all of the available m years of the IPS data. The result is an m by n − 1 matrix of the model parameters.
3. Perform the PCA on this matrix. Take the resulting matrix of the transformation from the parameter space to the PC space and transform the Legendre base into the PCA base. Reduce the dimension of this base by admitting only the base vectors corresponding to the PCs that contribute at least 1% to the total variance (in our case, the 9 largest PCs); adopt these base vectors and the parameters as the basis for the reduced SW model. 4. Take the 9 proxies listed in Table 1, calculate yearly averages for all of them for the years that have the IPS measurements.
The result is a matrix of m by 9 yearly averages of the proxy values. 5. Perform the PCA on the proxies, transform the original proxy time series matrix into the PCA space. 6. Perform fitting of the speed profiles for the proxies from the time intervals when there are no IPS data available; to that end, use Equation 1 to obtain M and c. The result is SW speed profiles for the centers of calendar years. 7. Calculate the SW density profiles using the just-derived speed profiles based on the total SW energy flux (Le Chat et al. 2012) using the methodology employed by Porowski et al. (2022). We checked that the Generalized SW Model provides stable solutions both within the IPS data range  and outside this period. We found no systematic differences, but noticed an increasing spread of the residuals towards the solar poles. We found that this spread is neither due to the adopted model setup, nor to the proxy set used. We concluded that the Generalized SW Model for the 36 years of available IPS data returns some discrepancies at higher latitudes, especially during periods of high dynamics of the SW variations (i.e., during transition between the minima and maxima of the solar activity). Most likely, a larger temporal resolution of the initial SW model would be necessary to eliminate this effect.
On the other hand, the global accuracy of the SW profiles generated by the Generalized SW Model is satisfactory. The Generalized SW Model may be regarded as a useful tool to provide SW speed distribution for periods when no IPS observations are available. At the current state of analysis it seems impossible to conclude if an improvement of the χ 2 of the model observed when 11-year subsamples are used in the fitting is a statistical effect or if this is because the the relation between the SW parameters and the proxies used in the Generalized Model changes from one solar cycle to another. A resolution of this enigma may be obtained when the time resolution of the Generalized model is increased, which will be a topic of a separate study.
The Generalized SW Model is the newest extension of the WawHelIioIon 3DSW model of the evolution of the solar wind speed and density, and the resulting charge exchange rates of interstellar neutral species inside the heliosphere. This extension provides an opportunity to improve the current knowledge on the solar wind and its interaction with the interstellar medium.