Evaluating the concordance of Egyptian and international sunspot observations

This study provides an exhaustive examination of the evolution of sunspot number (SSN) observations within Egypt, a nation celebrated for its profound astronomical legacy. Although Egypt has a well-established tradition of solar observation, the local SSN records spanning from 2010 to 2022 are compromised by a considerable frequency of absent data, thereby presenting substantial challenges to the precise assessment of solar activity. Addressing this challenge, the study employs dynamic time warping (DTW) as a methodological tool to assess the alignment of local and global SSN datasets. This technique adeptly harmonizes these datasets by reconciling temporal inconsistencies and variations in sampling rates. Subsequent to the application of DTW, the research integrates orthogonal regression for the imputation of the absent values in the Egyptian SSN dataset. This method, preferred for its proficiency in managing errors in both the dependent and independent variables, deviates from conventional linear regression techniques, thereby providing a more nuanced approach to data approximation. The investigation delineates a noteworthy statistical association between the locally-estimated SSN values and the global SSN indices. This correlation is characterized by a consistent pattern in which the locally-derived SSN are systematically lower in comparison to their global counterparts. Nevertheless, these local values display parallel trends and seasonal fluctuations akin to those observed in the global dataset, validating the imputation method and highlighting the unique characteristics of the Egyptian SSN data within the global context of solar activity monitoring. The implications of these findings are significant for the discipline of solar physics, especially for regions contending with incomplete datasets. The methodologies advanced in this research offer a robust framework for the enhancement of datasets with missing data, thus broadening the comprehension of solar phenomena.


Introduction
sunspot observations are crucial in solar astronomy, serving as key indicators of solar activity and revealing the interaction of magnetic fields, plasma dynamics, and energy emissions.They provide insights into the solar cycle, an eleven-year period of fluctuating solar activity, and help understand solar magnetism and space weather phenomena.Sunspot analysis also enhancing our knowledge of stellar processes [1,2].
The monitoring of sunspots involves a collaborative effort among a global network of observatories.These observatories collect data on both global and local sunspot numbers, offering a comprehensive view of solar activity.Globally, sunspot numbers are collected by a network of observatories, contributing to a comprehensive dataset that reflects solar activity dynamics.This international collaboration ensures data consistency and reliability, vital for space weather forecasting and understanding solar cycle impacts on Earth's climate.
Helwan [9,10].Traditional observation methods, including sketching sunspots, coexist with archiving, creating a rich repository of both visual and digital data.This extensive database is invaluable for longitudinal studies of solar activity and contributes significantly to global solar research.NRIAG has made significant contributions to understanding astronomical phenomena, particularly through its detailed observations of solar and lunar eclipses, and planetary transits.The transparent skies over Helwan enabled astronomers to gather numerous valuable photographic plates.These included images of nebulae, galaxies, and comets, notably Comet Halley during its close approach to the Sun in 1910.They also captured images of Jupiter's moons, the planet Pluto, the Moon, and various stellar observations [14].NRIAG's documentation of several partial solar eclipses, including those in 2003, 2005, 2006, 2010, 2011, 2013, 2015, 2020, and 2022, has been crucial in advancing our knowledge of solar dynamics and the Sun's impact on Earth.NRIAG's work on lunar eclipses is equally notable, encompassing observations of total eclipses in 2001,2003,2015,2018,2019,2020, and various partial and penumbral eclipses.These studies have provided valuable insights into Earth's orbital dynamics and atmospheric properties.Additionally, NRIAG's monitoring of Mercury transits in 2003, 2016, and 2019, as well as Venus transits in 2004 and 2012, has greatly enhanced our understanding of planetary orbits and atmospheric characteristics.These cumulative efforts not only advance scientific knowledge but also promote educational and public engagement in astronomy.NRIAG's comprehensive research in these areas highlights its key role in both regional and global scientific communities, contributing to a deeper understanding of celestial dynamics and fostering a continued interest in astronomical studies [15].
Starting in 1964, the observatory has focused primarily on solar research and monitoring artificial satellites.All the smaller telescopes at the observatory are part of the department of solar research and space sciences.These telescopes support both research programs and educational purposes.In 1964, the observatory installed a 6-inch Coudé refractor telescope, equipped with a photographic solar camera.This telescope has been used to observe sunspots daily, covering solar cycles 20, 21, and 22.For over thirty years, monthly reports detailing observations of the solar photosphere have been sent to world solar data centers.The Coudé refractor at the observatory has been extensively used for educational purposes.This telescope has been instrumental in supporting both undergraduate and postgraduate students from Egyptian universities.Furthermore, it has played a crucial role in training sessions for NRIAG researchers.Additionally, it has been a key component in various international summer schools organized in partnership with the International Astronomical Union [16][17][18].
Figure 1 illustrates the Coudé Refractor Telescope, which features an objective lens of 15cm, a focal length of 225 cm, and produces images with a diameter of 25 cm.The figure also includes examples of daily sunspot observations collected by the telescope.Specialists periodically calculate and analyze this data, and monthly reports are sent to international centers.This figure showcases the Coudé Refractor Telescope in operation, along with three representative samples of collected sunspot number (SSN) activity.These samples, captured at different times, illustrate the dynamic nature of solar activity as monitored by the telescope, providing valuable insights into the Sun's changing surface phenomena.

Time-series similarity and imputation
In time-series analysis, two significant methods, Dynamic Time Warping (DTW) [19,20] and Orthogonal Regression (OR) [21,22], are profound methods in addressing time-series data irregularities.DTW excels in measuring similarity between temporal sequences by aligning them non-linearly, thus accommodating expected variations.This makes it particularly useful in fields where precise time alignment is crucial, such as sunspot number.On the other hand, OR offers a symmetrical approach to data imputation, minimizing errors by considering uncertainties in both dependent and independent variables.Combining Dynamic Time Warping (DTW) with Orthogonal Regression (OR) offers a robust approach for improving the analysis of sunspot number time-series data, especially in situations involving missing data or temporal mismatches [23,24].

Dynamic Time Warping (DTW) similarity
Dynamic Time Warping (DTW) is an algorithm in time-series analysis, primarily used for quantifying the similarity between two temporal sequences such as local and global reported sunspot numbers.It is designed to align these sequences in a way that minimizes the overall distance between them, effectively handling variations in length.
Let us consider two time-series, Q = {q 1 , q 2 ,K,q m } and C = {c 1 , c 2 ,K,c n }, with lengths m and n respectively.DTW aims to find a warping path that aligns elements of Q with those of C, minimizing the cumulative distance across these alignments [20,25] as demonstrated in the equation below: Where, Φ denotes the set of all possible warping paths.The function fÎF min searches for the path f that yields the smallest cumulative distance.
The DTW algorithm calculates this minimum distance by constructing and populating a distance matrix D. Each element D(i, j) in this matrix represents the distance between q i and c j , plus the minimum cumulative distance to reach this point from the start of the sequences.This is defined by the recursive formula: In this context, d(q i , c j ) computes the distance between the individual elements of the two series.The minimization within the formula ensures that each step in the matrix selects a path that aggregates the least possible distance from the beginning to the current position.Through these mechanisms, DTW provides an effective means to compare time-series data.It allows for elastic adjustments along the temporal axis, accommodating disparities in timing and pacing between the series, hence offering a more comprehensive and accurate measure of similarity than traditional linear comparison methods.

Regression imputation
Unlike traditional methods, time-series data, with its inherent sequential nature and autocorrelation, poses unique challenges for imputation.Orthogonal regression, also known as total least squares, presents a robust solution by considering errors in both dependent and independent variables [22].
Diverging from the conventional ordinary least squares (OLS) method, which minimizes vertical distances from data points to the regression line, orthogonal regression minimizes the perpendicular distances.This distinction is crucial, especially in scenarios where both variables in the regression are subject to uncertainties or measurement errors.By adopting this approach, orthogonal regression provides a more accurate and reliable means for imputing missing values in time-series data, thereby enhancing the overall integrity and usefulness of the dataset.The implementation of orthogonal regression in time-series imputation addresses a common and critical issue in statistical analysis and data science, ensuring higher accuracy and reliability in the treatment of temporal data [26].
Consider a time series (Y t ) with missing values.The goal is to estimate these missing values (Y tmissing ) using the observed data (Y tobservedg and possibly other correlated time series X t .The orthogonal regression model can be expressed as [22]: Where: Y t represents the dependent time series.X t indicates the independent time series (or lagged values of Y t .β is the regression coefficient.ò t denotes the error term.
The core objective in orthogonal regression is to minimize the sum of squared perpendicular distances: Here, d ⊥ (Y t , βX t ) is the perpendicular distance from a point to the regression line.

Evaluation metrics
The evaluation of orthogonal regression performance can be effectively assessed using the root mean squared error (RMSE), which measures the average magnitude of the residuals (prediction errors) between the observed and predicted values [27].The RMSE is calculated as the square root of the average squared differences between the actual and predicted values: where y i represents the actual observed values, ŷi denotes the predicted values obtained from the regression model, and n is the total number of observations.A lower RMSE indicates a better fit of the model to the data, as it implies that the predicted values are closer to the actual values.In the context of orthogonal regression, assessing the RMSE allows researchers to gauge the model's accuracy and its ability to account for errors in both the dependent and independent variables.By minimizing the RMSE, the model can improve its predictive performance and provide more reliable insights into the data [27,28].

Data collection and exploratory analysis
The SIDC-SILSO database stands as a leader in the field of global sunspot observation and is internationally recognized as the definitive source for sunspot data.It works in collaboration with a wide network of observatories to create a detailed and extensive database (figure 2(a)).This valuable collection of data is subject to stringent quality control measures to ensure its accuracy and consistency.Once verified, the data is integrated into the SILSO database, making it available worldwide.At the National Research Institute of Astronomy and Geophysics (NRIAG), astronomers employ a manual method to gather and document sunspot data (figures 1(b), (c), and (d)).They use solar telescopes outfitted with specialized filters (figure 1(a)) to conduct detailed observations and tracking of sunspots, paying close attention to features such as size, shape, and location.NRIAG adheres to strict quality control procedures to ensure the reliability of the data.These procedures include the elimination of errors and the assessment of anomalous data points.It's important to note that systematic data storage and archiving, complete with extensive metadata, began in earnest in 2010.This marked a significant step forward in enabling easy access to the data and supporting future research projects.
Table 1 provides a detailed statistical analysis of the local and global sunspot number dataset, encompassing the period from 2010 to 2023.The dataset is centered around 2016, based on its median value, indicating a significant temporal focus during this period.Regarding Global sunspot Number (GSSN), the data exhibits considerable variability, highlighted by a mean of 55.81 and a high standard deviation of 49.52.The range of this variability, stretching from 0 to 240, underlines the dynamic nature of sunspot activity on a global scale.The median value being lower than the mean suggests a right-skewed distribution, implying that periods of high sunspot numbers, while less frequent, significantly influence the average.This skewness points to the sporadic nature of intense solar activity.In parallel, the Local sunspot Number (LSSN) shows a lower average value of 12.54 with a standard deviation of 12.74, indicating variability but on a less dramatic scale compared to the GSSN.The median value for LSSN, like its global counterpart, is lower than the mean, revealing a right-skewed distribution.This distribution pattern, within a range of 0 to 100, indicates a more constrained but still significant variability in local sunspot activity.
Figure 2 in the study offers insightful perspectives into the patterns of sunspot occurrences, revealing key aspects of solar activity dynamics.Figure 2(a) focuses on the global daily sunspot numbers from 2010 to 2023, illustrating a clear cyclical pattern that mirrors the well-known approximately 11-year solar cycle.This cycle is characterized by periodic fluctuations in solar activity, as evidenced by the rise and fall in the number of sunspots.The temporal distribution of GSSN shows discernible peaks and troughs, with significant daily variability around each peak, indicative of the transient and complex processes on the solar surface.A critical observation from this analysis is the identification of solar activity peaks, with a notable maximum around 2014 and another expected peak around 2024.These peaks confirm the periodic nature of the solar cycle.Conversely, the minima, particularly around 2019-2020, demonstrate marked reductions in sunspot numbers, aligning with  , constituting a significant fraction of the data.This absence of information poses considerable challenges for conducting comprehensive analysis and achieving accurate interpretations.The dynamic and fluctuating nature of solar activity means that these data gaps are particularly problematic, potentially leading to incomplete or distorted views of solar activity trends.The local sunspot data, with its distinctive characteristics and lower sunspot counts, requires careful analysis, especially considering the regional factors that may influence sunspot observations.These differences highlight thelimitations of the observational range and emphasize the importance of accounting for these regional discrepancies in the analysis and interpretation of the data.
Comprehending the disparities between the Global sunspot Number (GSSN) and Local sunspot Number (LSSN) datasets provides a detailed and nuanced insight into both global and local solar activities [32].The conspicuous variability observed in the GSSN, along with the notable data gaps in the LSSN, accentuates the necessity for prudent interpretation.Addressing these challenges to bolster the analysis's accuracy necessitates the application of sophisticated data processing techniques.The use of advanced imputation methods, in particular, can be instrumental in this context.Implementing such methodologies would enable a more accurate and exhaustive understanding of the trends and implications inherent in solar activity patterns.

Results and discussion
The comprehensive analysis of sunspot numbers (SSN) depicted in figures 3(a) to (f) from the study provides an intricate understanding of the dynamics of solar activity.These temporal distributions of SSN, through their detailed time series analyses, capture both the short-term fluctuations and longer-term trends in solar activity, offering insights into the complex nature of solar phenomena.
Figure 3(a) presents a time series analysis of global daily sunspot numbers over day of the month, illustrating the fluctuations in sunspot activity.The line graph, complemented by a shaded area indicating variability or uncertainty, shows a discernible trend of increasing activity, peaking around mid-month, followed by a plateau.This pattern of periodicity suggests a cyclical nature of solar activity within the month, with significant daily fluctuations.In contrast, figure 3(b) focuses on local daily total sunspot numbers, demonstrating a similar cyclical pattern but with lower overall values.The graph's dense data points, coupled with the shaded area, suggest variability in sunspot counts and possibly measurement uncertainties.The slight offset in peak timings from the global data could be attributed to the localized nature of the data collection.Figures 3(c) and (d) extend the analysis to the monthly scale, revealing non-linear and cyclical patterns with noticeable peaks and troughs.These figures suggest a non-stationary process influenced by the solar cycle, with potential correlations to solar rotational dynamics or magnetic cycle patterns.The seasonality observed is pivotal for correlating solar activity with various space and terrestrial weather phenomena.Figures 3(e) and (f) provide a broader temporal context, highlighting the natural 11-year solar cycle through the observed rise and fall of sunspot numbers.These patterns align with the expected progression of the current solar cycle, emphasizing the importance of monitoring such trends for their implications on space weather and its effects on satellite operations and Earthbased communication systems.
Figure 4 showcases a thorough comparative analysis of global and local sunspot numbers (SSN) utilizing boxplots, also referred to as box-and-whisker plots.These plots serve as an efficient method for illustrating the distribution of SSN data across daily, monthly, and yearly intervals.Boxplots convey the data's dispersion through a five-number summary, which includes the minimum, the first quartile (Q1), the median, the third quartile (Q3), and the maximum values.This comprehensive approach not only emphasizes the range and variability of the data but also illuminates the presence and extent of outliers.Moreover, these visual representations provide valuable insights into the data's symmetry, the density of data clustering, and the potential for skewness within the SSN datasets, offering a clear and concise overview of the statistical characteristics of global versus local sunspot activity over time.Figures 4(a) and (b) illustrate the daily variability of GSSN and LSSN, respectively.The GSSN boxplots reveal moderate fluctuations in median values, reflecting the variable nature of solar activity, yet the consistent interquartile ranges (IQR) suggest a stable distribution of sunspot counts day-to-day.Despite this variability, the uniformity of the IQRs indicates a level of predictability in the GSSN.Outliers, which are apparent in the data, mark occasional extremes in SSN counts.In comparison, the LSSN boxplots display generally lower median values, with a concentration toward the lower quartile, indicating a distribution skewed towards fewer sunspots.The IQRs are notably narrow, signifying less variability in LSSN activity on a daily basis.However, the presence of numerous outliers above the whiskers highlights significant, albeit sporadic, spikes in local sunspot activity.These outliers represent potential observational inconsistencies.Figures 4(c) and (d) provide a comparative analysis of monthly solar activity, examining both GSSN and LSSN over the course of a year.The GSSN boxplots reveal variability in median values from month to month, which could be attributable to the solar seasonal impacts, with the IQRs indicating the consistency of daily variations.This variability, in conjunction with the outliers, underscores the episodic nature of heightened sunspot activity.The whiskers' lengths differ across the months, signaling an asymmetrical distribution of SSN, and suggesting that certain months experience a broader range of solar activity.Conversely, the LSSN boxplots show consistently lower activity levels, with median values frequently situated near the lower quartile, which implies a reduced variability in local sunspot activity.Nonetheless, the occurrence of outliers indicates The process of decomposing sunspot number (SSN) data into its fundamental temporal elements is crucial for deepening our understanding of the inherent patterns and commonalities in solar activity.This detailed decomposition is instrumental in pinpointing periodic behaviors and long-term trends, as well as in identifying   5), we can more clearly observe the core similarities and differences.This analytical approach allows for a more nuanced understanding of the dynamics at play in both global and local contexts of sunspot activity.Figure 5(a) provides a detailed time series decomposition of the GSSN from 2010 to 2023, elucidating the intricate dynamics of solar activity.This decomposition divides the data into three essential components: trend, seasonality, and residuals, each uncovering specific characteristics of the Sun's behavior.The first panel displays the original SSN data, characterized by notable variability and distinct peaks of increased sunspot activity, highlighting the Sun's dynamic and ever-changing surface phenomena.This panel vividly captures the complexity and fluctuating nature of solar activity.The second panel, focusing on the trend component, shows a discernible shift: a declining trend until around 2020, followed by an upward trajectory.This shift mirrors the typical solar cycle, which spans around 11 years and includes periods of both heightened and reduced solar activity.The transition from a decreasing to an increasing trend is consistent with the expected movement from a solar minimum to a solar maximum phase within the observed period.In the third panel, the seasonality of the data is brought to the fore, demonstrating regular, consistent fluctuations over time.These periodic variations do not display a significant long-term trend, hinting that they might stem from regular temporal patterns, possibly linked to Earth's orbital behavior or axial tilt.The stability of these fluctuations emphasizes certain predictable elements in solar activity.The final panel illustrates the residuals, capturing the unpredictable components of the data.These residuals represent the portion of variance not explained by trend and seasonality, underscoring the complex and sometimes erratic aspects of solar behavior that go beyond regular cycles and predictable trends.This comprehensive decomposition is instrumental in understanding the multifaceted nature of solar activity, from its predictable patterns to its more unpredictable fluctuations.Figure 5(b) presents a time series decomposition of the LSSN covering the same period, offering a comparative perspective to the GSSN analysis.This decomposition similarly categorizes the data into trend, seasonality, and residual components, each providing unique insights into local solar activity.The trend component, exhibits a non-linear behavior with an overall declining trajectory, indicative of a long-term decrease in local sunspot activity.This pattern corresponds with the known phases of the solar cycle, capturing the alternating periods of heightened and diminished sunspot activity, and reflects the localized manifestation of solar dynamics.In the seasonality component, there are consistent periodic fluctuations, yet these lack a clear long-term trend.This indicates regular seasonal variations in the local sunspot numbers, which could be related to Earth's orbital dynamics and their influence on localized solar observations.The absence of a pronounced long-term trend in this component highlights the cyclical nature of these seasonal patterns.The residuals, reveal irregular spikes, most prominently in the latter part of the time series.These spikes could be related to transient solar phenomena or anomalies in data collection.The sporadic nature of these residuals, devoid of any discernible pattern, suggests that the deterministic parts of the time series (trend and seasonality) have been effectively isolated, leaving behind the stochastic or unexplained variance.This residual analysis emphasizes the unpredictability and complexity inherent in local solar activity, beyond the scope of regular cycles and trends.
The comparative analysis of the decomposed components of Global sunspot Number (GSSN) and Local sunspot Number (LSSN) has unveiled cyclical trends that align with the well-documented approximately 11year solar cycle.This analysis reveals a rhythmic increase and decrease in sunspot activity.Notably, the fluctuations within the GSSN data are more pronounced, indicating stronger variability.In contrast, the LSSN data exhibits a smoother and more stable trend, although it still echoes the global pattern, showcasing a level of synchronicity between global and local solar phenomena.
The seasonal components extracted from both GSSN and LSSN data display regular oscillations, indicating that sunspot activity is influenced by common seasonal factors.However, the amplitude of these seasonal variations is significantly higher in the GSSN data.This observation suggests that while both global and local sunspot numbers are affected by seasonal changes, the global data is more sensitive to these seasonal influences.Alternatively, it could mean that local sunspot activity is moderated by additional localized factors, which could diminish the impact of seasonal variations.
Regarding the residual components, which account for the random or irregular variations within the SSN data, both the GSSN and LSSN series exhibit erratic behavior.The patterns of these residuals do not show a clear correlation, with each series displaying its unique spikes and troughs at differing times.This indicates that local conditions and random disturbances significantly influence sunspot numbers, and these influences do not necessarily manifest concurrently on global and local scales.
In the comprehensive comparative analysis of Global sunspot Number (GSSN) and Local sunspot Number (LSSN) across different time framesdaily, monthly, and annuallythe study meticulously evaluated outliers, uncertainties, and the decomposed components of these sunspot numbers.Following this in-depth comparative evaluation, Dynamic Time Warping (DTW) analysis was introduced as a sophisticated method for aligning the temporal sequences of GSSN and LSSN.This advanced temporal alignment technique marks a crucial step before proceeding to the final stage of imputing missing values in the LSSN dataset.
DTW's role in this context is pivotal for accurately matching the time series of GSSN and LSSN, enabling the discovery of patterns and correlations that might not be evident through standard time series analysis techniques.By establishing a precise temporal correlation with DTW, the process significantly enhances the control and accuracy of the imputation technique for filling in missing LSSN data.This methodical approach ensures that the imputation not only reflects observable patterns and trends but is also adeptly adjusted to incorporate the intricate temporal dynamics that influence sunspot occurrences and recordings.
Employing DTW for fine-tuning the imputation strategy considerably strengthens the integrity of the reconstructed LSSN time series.This enhanced methodological rigor provides a more solid foundation for further analyses and modeling of solar activity, ensuring that subsequent efforts are built upon a dataset that accurately represents the complex temporal behaviors of sunspot numbers.
In a DTW analysis, the optimal match between two time series is found such that the sequences are aligned in a way that minimizes the cumulative distance between them.The warping paths represented in the visualization are the many possible alignments that DTW has considered to synchronize the two datasets.Figure 6(a) illustrates all the warping paths between Global sunspot Number (GSSN) and Local sunspot Number (LSSN) data, providing a robust method for examining the alignment and similarities between these two time series.The dense clustering of warping paths suggests regions where the GSSN and LSSN are closely aligned, indicating similar patterns of sunspot activity between global and local measurements.The spread or divergence of the paths at certain points implies periods of discrepancy where the local and global data do not align as closely, possibly due to local phenomena or measurement differences that cause a deviation from the global trend.This analysis highlights the dynamic relationship between GSSN and LSSN, providing insights into how local sunspot activities can both follow and diverge from global patterns.The areas of tight clustering can be of particular interest, as they indicate strong correlation which could be used to predict LSSN from GSSN data or vice versa, especially useful for imputing missing values or forecasting.Conversely, areas with wide divergence might be key periods for investigating specific local solar events or anomalies that cause deviation from the expected global activity pattern.Figure 6(b) depicts the Dynamic Time Warping (DTW) accumulated cost matrix, a sophisticated tool for analyzing the optimal sequence alignment between the Global sunspot Number (GSNN) and Local sunspot Number (LSSN) time series.This matrix is a comprehensive aggregation of the point-wise distances between the two time series under comparison.Each matrix element aggregates the cost from the origin (0,0) at the top left corner, which denotes the inception of both series, to that point, encapsulating the notion of a cumulative cost.The prominent red line delineating a trajectory through the matrix signifies the DTW path, which is the most cost-effective route that minimizes the overall distance between the time series.This path is the result of a series of decisions, selecting at each juncture the stephorizontal, vertical, or diagonalthat perpetuates the minimal cumulative cost.This path is not constrained by a strict one-to-one temporal correspondence, allowing the algorithm to 'warp' time dynamically to achieve an optimal alignment.The periphery of the matrix generally defines the warping window, imposing potential restrictions on the path to prevent over-warping.The absence of such constraints in this instance, as the path spans the full extent of the matrix, permits unrestricted alignment between the series.The value 'Dist=1556.9114,'annotated on the figure, quantifies the total cumulative distance as determined by the DTW path, providing a metric of dissimilarity between the GSNN and LSSN.Lower values correspond to greater alignment and thus, higher similarity between the series, whereas higher values indicate greater dissimilarity.Additionally, the marginal distributions plotted along the matrix's left and top edges offer insights into the distribution of DTW distances for each time series independently.These plots illuminate the locations within the time series that contribute most significantly to the overall DTW distance, with pronounced peaks symbolizing zones of substantial cumulative distance.The DTW accumulated cost matrix and the resultant path elucidated here are indispensable for discerning the alignment of the series beyond mere visual comparison, revealing temporal shifts and event speed variations.This analysis is invaluable for identifying latent similarities and discrepancies within the time series, thereby enhancing our understanding of the temporal dynamics at play between global and local sunspot activities.
Conventional imputation methodologies typically do not sufficiently accommodate the inherent sequential dependencies and distinct temporal patterns characteristic of time series datasets.To address the deficiencies of traditional techniques and attain the ultimate objective of accurately imputing missing values in the Local sunspot Number (LSSN) dataset, we deployed sophisticated orthogonal regression imputation strategies.These techniques are meticulously designed to honor the intrinsic temporal architecture of the time series data.Orthogonal regression, unlike standard regression methods, accounts for errors in both explanatory and dependent variables, thus providing a more robust framework for dealing with the uncertainties present in time series data.By applying this advanced approach, we ensure that the imputation of missing LSSN values is not merely a function of cross-sectional data points but is instead deeply integrated with the chronological progression and fluctuation patterns exhibited by the dataset.Such a methodologically rigorous approach to imputation is paramount in preserving the temporal integrity of the LSSN series, thereby enabling a more authentic reconstruction of the data.This preserves the fidelity of the dataset's historical trends and cyclical behaviors, which are critical for high-precision solar activity analysis and forecasting.
The imputation process utilized in this research unfolds through a meticulously structured sequence of actions.Initially, the LSSN time series undergoes seasonal decomposition using an additive model, which disentangles the data into trend, seasonal, and residual elements.This step is particularly appropriate for LSSN, where seasonal fluctuations are anticipated to influence the overarching trend and incidental variations.Subsequently, the regression is implemented to interpolating the trend and residual components, temporarily setting aside the seasonal component.During this phase, orthogonal interpolation is applied, a technique especially proficient at preserving the time series' core structure while precisely estimating missing values.This method commonly employs mathematical functions or polynomials to reconstruct absent data points in a manner that minimizes any disruption to the data's fundamental progression.The process culminates with the reintegration of the seasonal component.This stage is pivotal for accurately reflecting the LSSN data's intrinsic seasonality within the completed dataset.By restoring the original attributes of the time series, especially the cyclical patterns intrinsic to SSN data, the integrity and authenticity of the time series are maintained.Figure 7 illustrates the outcomes of the regression process that has been implemented.
By employing this sophisticated methodology, the research effectively addresses the challenge of imputing missing values in the LSSN time series.This approach ensures that the unique temporal features of the data, including trends and seasonality, are preserved in the imputed values.This level of attention to the intrinsic properties of the time series data is essential in areas with strong seasonal influences, such as solar activity studies.It leads to more accurate and meaningful analytical results, providing deeper insights into the patterns and behaviors associated with solar phenomena.
In figure 7(a), the detailed juxtaposition of the Global sunspot Number (GSSN) against the imputed Local sunspot Number (LSSN), presented as a background reference, is critical in assessing the precision and dependability of the imputation methodology applied to the LSSN dataset.This comparison is essential in the field of astrophysical research, where precision in sunspot monitoring is of utmost importance.The analysis is designed to meticulously examine the level of alignment between the imputed LSSN and the GSSN, providing insights into the effectiveness of the imputation technique.The degree of congruence or the discrepancies observed between these datasets is a significant indicator of the reliability and precision of the imputation process applied to the Local SSN.Furthermore, figure 7(b) delves into a critical aspect of the research by comparing the actual collected LSSN data with the imputed LSSN values.This comparison is vital for assessing the accuracy of the imputation method in reflecting the real observed data.It serves as a benchmark to validate the effectiveness of the imputation technique, ensuring that the imputed values not only correlate well with the broader trends observed in the Global SSN but also align closely with the specific local observations of the SSN.
The root mean squared error (RMSE) obtained from the orthogonal regression analysis between local sunspot numbers (LSSN) and global sunspot numbers (GSSN) was calculated to be 0.4375.This relatively low RMSE value indicates a high level of agreement and precision in the regression model, suggesting that the relationship between LSSN and GSSN is accurately captured.The consistency between the local and global data sets, as reflected by the RMSE, supports the reliability and validity of the regression approach.Additionally, the low RMSE value enhances confidence in the quality of the imputed local sunspot numbers and their alignment with the established global sunspot data.This improvement in the regression analysis contributes to a more nuanced understanding of solar activity and its effects, offering insights that can be valuable for further research and practical applications in space weather forecasting and satellite operations.
Together, these comparisons in figure 8 provide a comprehensive evaluation of the imputation method's performance.The histogram analysis, as depicted in figure 8  significant systematic bias across the spectrum of measurements.Nevertheless, the dispersion of data points and the occurrence of outliers beyond the agreement limits highlight specific cases where the imputed values markedly deviate from the original ones.Notably, the scatter tends to increase with higher mean values, hinting at a potential proportional bias wherein the discrepancy between methods varies depending on the magnitude of the measurement [33].Collectively, these analytical methods provide a comprehensive evaluation of the imputation model's efficacy, shedding light on its performance characteristics and areas for potential refinement.

Conclusions
The current research highlights the critical role of sunspot Number (SSN) as an essential indicator for understanding solar activity, effectively marrying traditional knowledge with contemporary analytical methods.Drawing from historical records, such as those from ancient Egypt, the study of SSN is enriched with a deep historical context, shedding light on its enduring significance in solar observations.This historical perspective enhances our appreciation for the contribution of local SSN measurements to modern solar behavior studies.A key innovation in this research has been the implementation of Dynamic Time Warping (DTW) as a fundamental technique for the temporal analysis of local and global SSN time-series data.DTW's capacity to identify non-linear temporal alignments has enabled a detailed examination of local solar activity patterns and anomalies, providing insights that surpass those obtained from traditional linear analysis techniques.
Moreover, the application of orthogonal regression for filling in missing values in local SSN datasets represents a significant advancement in maintaining data integrity.By prioritizing the preservation of the original data structure and relationships, this method has improved the accuracy of local solar activity models, ensuring that the imputed values faithfully represent the actual solar events.
The inclusion of the Bland-Altman method to evaluate the concordance between imputed and observed local SSN (LSSN) data has been another crucial aspect of this study.This approach offers a clear, visual examination of the discrepancies between time-series data, providing an efficient means to verify the precision of the imputation techniques employed.It has proven to be an invaluable tool in affirming the models' reliability and validity.
By integrating reported local SSN data with cutting-edge time-series analysis and robust data imputation methods, this study has developed a comprehensive and accurate depiction of solar activity.This strategy demonstrates a notable blend of historical insights and modern analytical capabilities, making a substantial contribution to our collective understanding of solar phenomena.

Figure 1 .
Figure 1.Recording Coudé refractor telescope and three samples of collected sunspot number (SSN) activity at various times.

Figure 2 .
Figure 2. Comparative analysis of the temporal distribution of collected global and local daily total sunspot numbers.

Figure 3 .
Figure 3. Comparative analysis of the temporal patterns in global versus local sunspot numbers across daily, monthly, and yearly intervals.This analysis delves into the patterns and variations observed in sunspot numbers on a global scale compared to those recorded at a local level, scrutinizing the data through daily, monthly, and annual SSN.

Figure 4 .
Figure 4. Detailed examination of boxplots illustrating the variations in global and local sunspot numbers over daily, monthly, and yearly periods.
outliers and irregularities within the SSN time series.By separating both the Global sunspot Number (GSSN) and the Local sunspot Number (LSSN) data into their trend, seasonal, and residual components (as illustrated in figure

Figure 5 .
Figure 5. Comparative analysis of decomposed components of global and local daily sunspot numbers.

Figure 6 .
Figure 6.Dynamic time warping (DTW) analysis comparing global and local daily total sunspot numbers.
(a), examines the frequency distribution of the discrepancies between the observed Local sunspot Number (LSSN) and the imputed sunspot Number (SSN).The distribution exhibits a near-normal profile with its central tendency clustering around zero.This pattern suggests a general alignment of the imputed SSN with the observed LSSN values.However, the data's dispersion and a minor rightward skew indicate some variability in the imputation model's accuracy, potentially signaling a systematic underestimation in certain instances.In parallel, the Bland-Altman analysis, presented in figure8(b), evaluates the concordance between the original and imputed SSN datasets.This analysis plots the average of the original and imputed values against their differences.The central tendency of these differences, along with the limits of agreement, is demarcated by red dashed lines.The central line's closeness to zero indicates an absence of

Figure 7 .
Figure 7. Comparative analysis of daily total sunspot numbers: observed global, observed local (in blue), and imputed values (in red).

Figure 8 .
Figure 8. Evaluation of the accuracy of the imputation process.

Table 1 .
[31]9]hensive descriptive statistics and missing value (nan) analysis of the daily global and local sunspot number (SSN) dataset.The ascending phase to the 2024 peak exhibiting higher sunspot numbers than the previous cycle could indicate a stronger solar cycle or inherent variability between cycles[2,29].These findings have broad implications, influencing space weather, satellite communications, and terrestrial climate.During solar maxima, increased sunspot activity correlates with more frequent solar flares and coronal mass ejections, impacting Earthʼs magnetosphere and ionosphere[30].In contrast, solar minima are characterized by reduced solar activity but are not devoid of solar events[31].Figure 2(b), which illustrates the local daily total sunspot numbers recorded by NRIAG from 2010 to early 2023, provides a unique perspective that complements the global view.The chart reveals a cyclical pattern in the Local sunspot Number (LSSN) that mirrors global tendencies, albeit with generally fewer sunspots.Notable peaks occurring around 2014 and 2021, which are slightly offset from the global patterns, underscore the localized nature of these observations.Importantly, a discernible decline in sunspot activity is observed post-2015, with a marked decrease after 2020, indicating a potential local minimum phase or perhaps shifts in observational methods or data documentation practices.A critical aspect of the LSSN dataset is the existence of 3,435 missing entries (as indicated by blanks in figure 2(b))