Relation between Latitude-dependent Sunspot Data and Near-Earth Solar Wind Speed

Solar wind is important for the space environment between the Sun and the Earth and varies with the sunspot cycle, which is influenced by solar internal dynamics. We study the impact of latitude-dependent sunspot data on solar wind speed using the Granger causality test method and a machine-learning prediction approach. The results show that the low-latitude sunspot number has a larger effect on the solar wind speed. The time delay between the annual average solar wind speed and sunspot number decreases as the latitude range decreases. A machine-learning model is developed for the prediction of solar wind speed considering latitude and time effects. It is found that the model performs differently with latitude-dependent sunspot data. It is revealed that the timescale of the solar wind speed is more strongly influenced by low-latitude sunspots and that sunspot data have a greater impact on the 30 day average solar wind speed than on a daily basis. With the addition of sunspot data below 7.°2 latitude, the prediction of the daily and 30 day averages is improved by 0.23% and 12%, respectively. The best correlation coefficient is 0.787 for the daily solar wind prediction model.


Introduction
Solar wind is a continuous plasma from the upper atmosphere of the Sun that changes fundamentally at different phases of the solar cycle (Bame et al. 1976;Gosling et al. 1976;McComas et al. 2013).The solar wind speed is an important parameter deciding solar wind dynamic pressure, which influences heavily the shape, structure, and physical process of the magnetosphere (Shue et al. 1997;Lin et al. 2010;Li et al. 2011Li et al. , 2013;;Yu et al. 2016).During the solar minimum, the solar wind speed behaves differently with latitude, showing that the fast solar wind at high latitudes originates from the polar coronal holes and the slow solar wind originates from the closed magnetic lines at the solar equator (Balogh et al. 2008;Brooks et al. 2015;Tian et al. 2021).During the solar cycle maximum, there is an increase in transient disturbances in the solar wind associated with flares and coronal mass ejections (CMEs), and it is suggested that the flow speed should increase with solar activity according to the early formulations of solar wind theory (Parker 1965(Parker , 1969)).However, some observations indicated that the high-speed streams in the ecliptic solar wind are far more common during years of declining, or minimum, solar activity as opposed to near the solar maximum (Gosling et al. 1977;Coles et al. 1980;McComas et al. 2002McComas et al. , 2003;;Manoharan 2012).Solar wind is the major medium in interplanetary space affecting the Sun-Earth relationship (Temmer 2021).The variability in the solar wind speed with the phase of the sunspot cycle is therefore of utmost importance to obtain reliable solar wind forecasts.
In recent years, many models have been created to predict solar wind.There are two main types: physical-based magnetohydrodynamic (MHD) models and empirical models.Long before solar wind was observed, scientists used theories to predict its existence (Parker 1965).Since then, researchers have developed a large number of physical models, such as the ENLIL model, which is currently in use at the National Oceanic and Atmospheric Administrationʼs (NOAA) Space Weather Prediction Center, and the recently developed European Heliospheric Forecasting Information Asset and the Space Weather Modeling Framework at the Center for Space Environment Modeling (Linker et al. 1999;Arge & Pizzo 2000;Odstrcil 2003;Pomoell & Poedts 2018).These models are based on the physical principle that closed magnetic field lines confine the solar wind plasma and open magnetic field lines accelerate the solar wind flow to supersonic propagation into the heliosphere.In global heliospheric solar wind simulations, it was also found that the solar wind variability is strongly correlated with the solar cycle, with the corona and heliosphere being more variable at the solar maximum.However, limiting the analysis of the simulations to the solar equatorial region strongly reduces the difference between solar maximums and minimums (Owens et al. 2022).
Empirical models have been developed since the major discovery by Wang & Sheeley (1990) of an empirical relationship between the flux tube expansion factor and solar wind.Empirical models are also popular tools for solar wind prediction because they are more efficient and simpler than complex physical models (Owens et al. 2008).Wintoft & Lundstedt (1999) used a neural network approach to study the daily average solar wind characteristics.They used the flux tube expansion factor as the input parameter and could predict the solar wind speed two days in advance with a test set correlation of 0.53.Currently, with advances in machine learning, this approach is also being applied to solar wind forecasting (Upendran et al. 2020;Sun et al. 2021Sun et al. , 2022)).There are also many machine-learning models that predict solar wind based on the relationship between coronal holes and solar wind speed, which was first shown in Krieger et al. (1973).Luo et al. (2008) proposed a new forecast index representing the solar magnetic field strength, which has a good correlation with the solar wind speed and obtained good forecasting results.solar corona to predict the solar wind speed measured at Lagrange Point 1 (L1) and obtained a best-fit correlation of 0.55 ± 0.03 with the observed data.Bailey et al. (2021) presented a machine-learning approach to use in place of the Wang-Sheeley-Arge model for predicting solar wind speed in the near-Earth vicinity, and they found that the predictions of the solar wind were more inaccurate during more active solar cycles.A study by Brown et al. (2022) found that their model had better forecasts during the declining solar cycle.Owens et al. (2019) predicted the near-Earth solar wind using corotation from Lagrange Point 5 (L5) and found that the latitudinal difference between the Earthʼs orbit and the solar equator would introduce errors into the prediction.
A number of other solar wind forecasting methods have been developed.The solar wind has a periodicity of approximately 27 days, which is caused by the rotation of the Sun.The prediction model is the method of recuperating the solar wind recurrence, wherein the near-Earth solar wind conditions today are assumed to be identical to those of the previous 27 days.Based on the 27 day solar rotation cycle, the prediction effect of this method is better than that of the MHD simulation during the solar minimum (Owens et al. 2013).Bussy-Virat & Ridley (2014) predicted the solar wind speed 5 days in advance using a probability distribution function model based on the periodicity of the solar wind velocity related to the rotation of the Sun.However, during the solar maximum, the CME erupts violently and destroys the steady structure of the 27 day periodic solar wind, leading to uncertainty in the empirical method.
An important scientific goal of the space weather research and prediction community is to determine the relationship between the Sun and solar wind to specify the Sunʼs influence on the overall solar-terrestrial space environment (Paulikas & Blake 1976;Rostoker et al. 1998;Li et al. 2001a;Barker et al. 2005;Liu et al. 2010Liu et al. , 2015;;Cao et al. 2011Cao et al. , 2015;;Zhang et al. 2015Zhang et al. , 2022aZhang et al. , 2022b)).The main influence on the near-Earth space environment is the solar wind in the ecliptic plane, while there is a tilt between the solar equator and the ecliptic up to an angle of 7.°25 (Rosenberg & Coleman 1969;Li et al. 2001b).The sunspot number is a commonly used index of solar activity, and sunspots occur mostly within a latitudinal range of the solar disk (Solanki 2003;Hathaway 2015).Since the interpretation of the variable solar wind streams is complicated by the structure and evolution of the solar source region and the solar rotation effect, we investigate the impact of latitudedependent sunspot number on the solar wind speed to understand the modulation of the solar wind presented in this paper.Section 2 introduces the data used in this study as well as how the data were processed.It also presents the dependence of the solar wind speed on the sunspot cycle from the observations.Section 3 discusses two methods, the Granger causality analysis method and the machine-learning prediction model, which were used to study the relationship between sunspot number and solar wind speed.Our conclusions and discussion are presented in Section 4.

Data
The sunspot data for this study are from the Royal Greenwich Observatory (RGO) and US NOAA data available online.3RGO compiled sunspot observations from a small network of observatories to produce a data set of daily observations starting from 1874 May.After RGO ceased its program, the US Air Force started compiling data from its own Solar Optical Observing Network.This program has continued with the help of NOAA, and much of the same information has been compiled through to the present.This sunspot data set contains the area and latitudinal position of daily sunspot active regions.The sunspot number in this study is obtained by counting the number of sunspot groups recorded every day, which is approximately 10 times the number of sunspots.
The sunspot data are investigated in this study at different latitudes, which are the sunspot number at all latitudes, SN all , the sunspot number at latitudes less than 20°, SN 20 , the sunspot number at latitudes less than 15°, SN 15 , and the sunspot number at latitudes less than 10°, SN 10 .Taking into account the angle of 7.°2 between the Earthʼs orbit around the Sun and the solar rotational equator, the sunspot number at latitudes less than .7°2,SN 7.2 , is also considered.The sunspot area data are also investigated by latitude, and the five data sets are SA all , SA 20 , SA 15 , SA 10 , and SA 7.2 .Sunspot areas are given in units of millionths of a solar hemisphere (μHem).
The solar wind data used in this study are obtained from the OMNI database, which provides hourly averaged magnetic field and plasma parameter data propagated to near-Earth space from 1963 to the present, which was collected by spacecraft (ISEE 3, Wind, and ACE) around the L1 point (King & Papitashvili 2005).Since the sunspot data are recorded on a daily basis and the variation in the daily solar wind speed is not large, the daily averaged solar wind is investigated, which is calculated based on the hourly values.
The observation data from 1963 to 2018 are shown in Figure 1, which indicates the daily latitude-dependent sunspot group numbers with different colors (as labeled).The figure shows that peaks of the five latitude-dependent sunspot number data occur at different times.It is well-known that as the cycle progresses, the range of latitudes of sunspots broadens, and the central latitude is toward the equator (Hathaway et al. 2003).This behavior is referred to as the solar "Butterfly Diagram," which is known as Spörerʼs law.Spörerʼs law expresses that the latitudes of flux that emerge show a dependence on the solar cycle phase.At the start of each cycle, spots appear at latitudes above approximately 30°-45°and then tend to progressively emerge at lower latitudes as the cycle progresses (Hathaway 2015).To observe this phenomenon more visually, we fitted the five latitude-dependent sunspot number data with polynomial curves.The fitting results are shown in Figure 1(b), where the colors are the same as in Figure 1(a).The dashed lines mark the peak of the sunspot group number for different latitudes.For example, for solar cycle 21 from 1976 March to 1986 September, the peaks of SN 20 , SN 15 , SN 10 , and SN 7.2 occur after 0.53, 0.99, 1.53, and 1.78 yr, respectively, relative to the peak of SN all .We can directly see that the peak time is delayed as the latitudinal range of the sunspot data decreases.
The daily averaged solar wind speed during the five solar cycles is shown in Figure 1(c).The data are processed to remove some gaps in the observations.In our study, the longterm missing values (more than three days) are removed, and the short-term missing (less than three days) solar wind data are replaced by the average.The histogram of the annual average solar wind speed is shown in Figure 1(d), where the peak of the solar wind speed generally occurs during the declining phase of each solar cycle, indicated by the green shaded area.The criterion of solar cycle division is obtained from the website.4

Relationship between Sunspot Number Data and Solar
Wind Speed In this section, two methods are utilized to study the relationship between latitude-dependent sunspot number and solar wind speed.

Granger Causality Test
The first method, the Granger causality test method, is used to demonstrate the relationship between latitude-dependent sunspot number and solar wind speed.This method is first used to solve the problem of time series data when it is difficult to determine the causal relationship between two relevant time variables and to determine whether feedback occurs (Granger 1969(Granger , 2003)).
The Granger causality test assumes that all the information about the prediction of each variable is contained in the time series of these variables.For example, there are two time series, x and y, and the method verifies whether x predicts a change in the time series y.First, a vector autoregressive (VAR) model is constructed based on two time series, and parameter estimation for the VAR model is conducted by employing the method of least squares.If x is the cause of the change in y, the coefficient in the VAR model with lagged x on y is not zero.Then, we proceed with hypothesis testing on the regression model parameters using the F-test, which determines whether the regression model is statistically significant.Finally, we calculated the p-value that represents the probability of observing the F-statistic or a more extreme value under the null hypothesis.According to the p-value of the hypothetical test to determine whether to accept the hypothesis and because the regression coefficient was assumed to be 0 in the Granger causality test, if x is the cause of the change in y, the original hypothesis is not accepted, and the p-value is less than 0.05, which means the probability that the observed data is consistent with this statistical model is likely to occur less than 5% of the time under the null hypothesis5 (Anderson 1995).In contrast, if the p-value is greater than 0.05, the original hypothesis holds that x is not the cause of the change in y.Therefore, Granger causality tests are applied to investigate the relationship between solar wind speed and sunspot number.
The annual average latitude-dependent sunspot data are chosen as the x variable and the solar wind speed as the y variable in the Granger causality test, with the results of the variation in p-values with time delay ranging from 1 to 10 yr shown in Figure 2. According to the Granger causality hypothesis, the x variable is the cause of the y variable only when the p-value is less than 0.05, as indicated by the red dashed line in Figure 2. The test result for five latitudedependent sunspot numbers is shown by different colors and shows that the impact of sunspot number on the solar wind speed is related to latitude.For a time delay of 1 yr, the pvalues for SN all and SN 20 are 0.18 and 0.06, larger than the threshold of 0.05, meaning that SN all and SN 20 are not the cause of solar wind variations, while the p-values for SN 15 , SN 10 , and SN 7.2 are 0.03, 0.01, and 0.01, indicating that the variations in solar wind are correlated with the sunspot number in this latitude range.It is suggested that sunspot numbers closer to the solar rotational equator have stronger effects on the solar wind speed.The best test result is for the time delay of 3 yr with p-values of 0.01, 0.005, 0.01, 0.01, 0.007, and 0.008 for the five sunspot number data, which are all approximately 0.01 and less than the threshold value of 0.05.For a delay of 5 yr, the p-values of the five sunspot data sets are larger than 0.05, indicating a weak effect on the variation in solar wind speed.For a time delay of 3 yr, SN all and SN 20 have effects on solar wind speed with p-values less than 0.05, while SN 15 , SN 10 , and SN 7.2 have effects on solar wind speed after a 1 yr delay, with p-values less than 0.05.It is suggested that the time delay between the annual average solar wind speed and sunspot number is reduced when the latitude range decreases.From this result, we suggest that the latitudinal migration of sunspots may correlate with the peak of solar wind speed occurring in the declining solar cycle.In the next subsection, we will examine the relationship from the perspective of solar wind prediction.

Machine Learning
The second method used in this study is the gradient boosting regression method, one of the machine-learning approaches.This algorithm is an integrated algorithm that can train multiple learning machines, ensuring that even if a few machines make errors, the results can be corrected by the majority of the learners.Gradient boosting regression is an analytical technique that is designed to explore the relationship between two or more variables.Its analytical output identifies important factors impacting the dependent variable and the nature of the relationship between each of these factors and the dependent variable (Friedman 2001;Natekin & Knoll 2013).We could study the relationship between latitude-dependent sunspots and the solar wind speed by predicting the solar wind speed with machine-learning models.
The steps of machine learning are as follows.First, we take the solar wind speed of the previous day as the input parameter.The solar wind speed is highly autocorrelated with itself over a short period.Thus, the model can use the solar wind autocorrelation to predict the future solar wind speed.To study the impact of latitude-dependent sunspot number on the solar wind speed, we also add three days of previous sunspot data containing sunspot number and sunspot area into the input data.Next, the data are divided into an 80% training set and a 20% test set.The training set is between 1963 and 2017 and contains four solar cycles, and the test set covers the complete 24th solar cycle.Then, we set the machine-learning parameters.In our model, the parameters are determined based on empirical methods, with a learning rate of 0.1 and the depth of the tree set to 5, and it will tend to overfit.In machine learning, feature selection can reduce the input parameters and the training time to optimize the model.However, our input parameters are determined.We remove the feature selection step.
The daily prediction results of solar wind speed for the period 2016-2017 are shown in Figure 3.The observations are indicated by the black line in the figure, with the prediction result of the control group shown with the blue line, and the result with added SN 7.2 and SA 7.2 data is indicated by the red line.It can be seen that both the prediction models with and without sunspot information generally reproduce the variations in the solar wind well, with small differences between the two models.The correlation coefficient (CC) between the solar wind self-prediction results and observations in the full test set is 0.785 for the period from 2008 to 2018.After adding the SN 7.2 and SA 7.2 data into the self-prediction model, the CC is 0.787.The results of the two prediction models are shown in   1.To study the impact of latitude-dependent sunspot data on the solar wind speed at different time scales, we also study the 15 day average and 30 day average solar wind forecasts.
The prediction results for the 30 day average solar wind speed are shown in Figure 5 for the period 2007-2018.In Figure 5, the observations are indicated by the black line in the figure, the self-prediction result is shown by the blue line, and the prediction result with added SN 7.2 and SA 7.2 data is indicated by the red line.It can be seen that the red predicted line adding sunspot data fits the observation better than the blue line.The CC values between the predicted results and the observed data for the self-prediction model and model with added sunspot information are 0.597 and 0.673, respectively.With the addition of sunspot information, there is an approximately 12.7% improvement in the prediction efficiency.This means that sunspot information has an impact on the 30 day solar wind speed variation.The 15 day average solar wind is also predicted with the sunspot information added, which is shown in the second row of Table 1.
The results of the prediction above are summarized in Table 1, showing the linear CCs between the predicted results and the observed data for three temporal scales with different input parameters.Improvement represents the increase relative to the control group after adding different sunspot data.The first row in Table 1 shows the daily result illustrating that the improvement effect of the solar wind speed prediction model gradually increases as the latitude of the added sunspot data decreases, but the improvement for daily predictions is very weak.For daily wind speed, the CC improves by 0.23% with sunspot data at latitudes below 7.°2.For the 15 day average wind speed, the improvement in the CC is 24% by adding sunspot data with latitudes below 10°.However, by adding sunspot data at all latitudes, the CCs of the predictions are reduced.For the 30 day average wind speed, all latitudedependent sunspot data have improved prediction models.In particular, the CC improves by 12.7% with sunspot data at latitudes below 7°. 2. This method also performs better on 30 day units, which is probably due to the Sunʼs rotation period of approximately 27 days.Compared to the daily and 30 day average solar wind speeds, the 15 day average has the worst prediction efficiency.
The above results reflect the impact of latitude-dependent sunspot data on the solar wind speed at different time scales.The near-Earth solar wind speed is more strongly influenced by low-latitude sunspots, and sunspot data have a greater impact on the 30 day average solar wind speed than on a daily basis.

Discussion and Conclusion
As shown in Figure 1, there is a time delay between the peak of the solar wind and the peak of the sunspot number.This signal has been studied previously.For example, Bame et al. (1976) and Gosling et al. (1976) found that large amplitude high-speed solar wind streams are more commonly observed in years of declining and minimum solar activity than near solar maximum, and they considered that changes in the frequency and nature of solar wind stream structures appear to be directly related to the long-term evolution of coronal holes (CHs), which are regions of low density in the solar corona.However, limited by the observational data, their conclusions are only apparent within one solar activity cycle.Later studies (Levine et al. 1977;Owens et al. 2005;Owens & Forsyth 2013) demonstrated that different sources of solar wind should cause different flow rates and that high-speed winds mainly originate    et al. 2017).Changes in the coronal structure of the sunspot cycle are associated with the evolution of the solar magnetic field and influence the transmission of solar wind into interplanetary space and eventually to the Earth (Hundhausen et al. 1981(Hundhausen et al. , 1984)).In sunspot minima, there are large polar coronal holes that, however, do not affect the Earth because the fast solar wind emanating from them does not reach the ecliptic.In the sunspot maximum, there are small short-lived coronal holes scattered at all latitudes, giving rise to short and relatively weak high-speed streams (Luhmann et al. 2002).
During the maximum years, the proliferation of sunspots confines the open magnetic field to the solar surface, and the coronal hole areas are small.As the solar cycle develops, pairs of sunspots separate, with the leading sunspots migrating toward the equator, the following sunspots moving poleward, and the open magnetic flux on the solar surface (where the high-speed solar wind originates) increases, which in turn creates the peak in the declining phase (Georgieva 2011).As shown by Wang et al. (2002), geomagnetic activity reaches a maximum on the sunspot declining phase when polar coronal holes have already formed and low-latitude holes begin attaching themselves to their equatorward extensions and growing in size, so the Earth is embedded in wide and longlasting fast solar wind streams.Georgieva (2011) shows that the time between the sunspot maximum and geomagnetic activity maximum on the sunspot declining phase is the time that takes the solar surface meridional circulation to carry the remnants of sunspot pairs from sunspot latitudes to the poles.The relationship between sunspots and solar wind requires exploration.Sunspots are a long-established set of observations and have an important role in the study of solar-terrestrial spatial relations.In this study, the OMNI data also show that the peak of the annual average solar wind speed occurs during the declining phase of the solar cycle, which is shown in Figure 1(d).This implies that this phenomenon is widespread and that there is perhaps a time delay between the peak solar wind speed and the sunspot number.As the central latitude slowly drifts toward the equator, the time when the sunspot number reaches its maximum moves to the right, which is closer to the peak of the annual average solar wind speed.Then, according to the Granger causality test results, the low-latitude sunspot number directly correlates with the solar wind speed, and the time delay of the two peaks is reduced.The latitudinal position of sunspots correlates with the peak solar wind speed occurring during the declining phase of the solar cycle.
We perform a cross correlation between the yearly average solar wind speed and sunspot number with a time delay, and the results are shown in Figure 6.The CC of the sunspot number at different latitudes and the solar wind speed reaches their maximum at different times.The solar wind speed is most correlated with sunspots at low latitudes (SN 10 and SN 7.2 ) during the second year, with midlatitude sunspots (SN 15 and SN 20 ) during the third year, and with all sunspot data (SN all ) during the fourth year.This result is consistent with the results of the Granger causality test.This means that the solar wind speed may peak approximately 4 yr after the sunspot number reaches its maximum.The time delay of the correlation between the annual average solar wind speed and the sunspot number decreases with decreasing latitude.This suggests that the latitude factor is not negligible.
The relationship between solar wind speed and other solar physical parameters is the basis of empirical prediction models, which are important research aspects for space weather.In previous empirical forecasting models, the negative correlation between solar wind speed at 1 au and the expansion factor of the coronal magnetic field at the source surface has been widely used (Owens et al. 2005;Macneice 2009;Reiss et al. 2016).Then, another parameter, the angular distance from the coronal hole boundary (Riley et al. 2001), was added by Arge et al. (2003) to improve the method for solar wind prediction.The other popular empirical model relies on the relation between coronal hole area and solar wind speed, which was discovered many years ago (Nolte et al. 1976;Sheeley et al. 1976;Levine et al. 1977).With the development of computer technology, many machine-learning models have also been developed on the basis of empirical relationships.Upendran et al. (2020) used a deep learning approach to predict the daily solar wind speed at the L1 point based on extreme-ultraviolet images of the solar corona and obtained a best-fit correlation of 0.55 ± 0.03 with the observed data.Sun et al. (2022) propose a Graph-Temporal-AR model to uncover the complex relationships among solar wind features and identify temporal dependencies.Yang et al. (2018) used an artificial neural network to predict solar wind averaged over several days at 1 au with a CC of 0.74.In this study, the gradient boosting regression machinelearning method was used to predict the solar wind speed at different time scales to study the relationship between sunspots and solar wind.The sunspot data have little effect on the daily solar wind speed prediction, with a 0.23% improvement achieved by adding SN 7.2 and SA 7.2 .For the 30 day average solar wind speed, the maximum improvement of the prediction model is 12% with SN 7.2 and SA 7.2 .For the 15 day solar wind speed, the maximum improvement of the prediction model is 24% with SN 10 and SA 10 , but the prediction is not satisfactory.Sunspot data have a greater correlation with the 30 day average solar wind speed prediction than with that on a daily basis.For predicting the daily averaged solar wind speed one day in advance, we have attained a CC of 0.787 by incorporating latitude-dependent sunspot data into the daily solar wind prediction model.
The solar wind has a high correlation with itself at the daily timescale.However, the correlation deteriorates as the delay time increases.Due to the solar rotation effect, the correlation recovers again at a timescale of approximately 27 days.This is probably the reason that the prediction CCs for daily and 30 day averages are higher than those for 15 day averages.Sunspots can represent the solar activity level.The solar wind can remain stable in the short term, provided that there is no influence of violent solar activity and the interplanetary environment.Therefore, 30 day solar wind can be predicted by empirical methods, and sunspot data can be added to improve the prediction efficiency.
The relation between sunspot active region and solar wind speed has been studied previously.For example, Stansby et al. (2021) estimated what fraction of solar wind originates in active regions, as a function of latitude and the fractional contribution of active regions to the solar wind is negligible at solar minimum, and typically 40%-60% at solar maximum, scaling with sunspot number and time.Ma et al. (2022) indicated that the magnetic polarity of an active region should contribute to the solar wind if it includes the potential-field source-surface open magnetic field lines.He et al. (2010) concluded that the upflowing plasma at an active region boundary may evolve into an intermediate-speed solar-wind stream observed near the Earth.This solar-wind stream has a speed of about 400 km s −1 , and an intermediate temperature and density if compared to the typical fast and slow solar wind.
This study discusses the impact of latitude-dependent sunspot data on solar wind speed.We verified the relationship between sunspots and the solar wind speed using two separate methods.It is shown that our model is able to identify some significant correlations between sunspot active regions and solar wind speed without built-in physics knowledge.Machinelearning approaches can help us explore hitherto unknown relationships of various physical data within the heliosphere.
Upendran et al. (2020) used extreme-ultraviolet images of the Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 licence.Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

Figure 1 .
Figure 1.Observation data of sunspot and solar wind speed from 1963 to 2018.(a) The daily latitude-dependent sunspot number.(b) The fitting result of daily sunspot number.(c) The daily solar wind speed.(d) Histogram of the annual average solar wind speed, and the solar cycle declining phase is represented in green.

Figure 2 .
Figure 2. Results of Granger causality analysis of latitude-dependent sunspot data and annual average solar wind speed.The red dashed line indicates the confidence level p-value equal to 0.05.

Figure 4 .
Figure4.This indicates that the SN 7.2 and SA 7.2 data have little effect on the daily solar wind speed prediction.The results of daily solar wind speed prediction with added sunspot data for other latitudes are summarized in the first row of Table1.To study the impact of latitude-dependent sunspot data on the solar wind speed at different time scales, we also study the 15 day average and 30 day average solar wind forecasts.The prediction results for the 30 day average solar wind speed are shown in Figure5for the period 2007-2018.In Figure5, the observations are indicated by the black line in the figure, the self-prediction result is shown by the blue line, and the prediction result with added SN 7.2 and SA 7.2 data is indicated by the red line.It can be seen that the red predicted line adding sunspot data fits the observation better than the blue line.The CC values between the predicted results and the observed data for the self-prediction model and model with added sunspot information are 0.597 and 0.673, respectively.With the addition of sunspot information, there is an approximately 12.7% improvement in the prediction efficiency.This means that sunspot information has an impact on the 30 day solar wind speed variation.The 15 day average solar

3.
Plot of the measured and predicted daily mean solar wind speed.The black line indicates the solar wind observation data; the blue line indicates the prediction result of solar wind speed as input; the red line indicates the result of adding sunspot data with latitudes less than 7°. 2.

Figure 4 .
Figure 4. Plot of daily solar wind speed prediction results.The blue point indicates the prediction result of solar wind speed as input; the red line indicates the result of including sunspot data with latitudes less than 7°. 2.

Figure 5 .
Figure 5. Plot of the measured and predicted 30 day average solar wind speeds.The black line indicates the solar wind observation data; the blue line indicates the prediction result of solar wind speed as input; the red line indicates the result of adding sunspot data with latitudes less than 7.°2.

Table 1
Comparison of the Solar Wind Model Prediction Results with Different Input Parameters Note.CC denotes the correlation coefficient; SN with subscript indicates the latitude-dependent sunspot number; SA with subscript indicates the latitude-dependent sunspot area and %Improvement denotes the percentage of model enhancement with the addition of sunspot data.from large CHs and CMEs.The CHs and CMEs have different solar cycle evolutions in the time phase (large CHs are antiphases with sunspot numbers), so different sources of solar wind lead to different speeds of solar wind, thus displaying different cycle behaviors (Leinert & Jackson 1998; Owens