Toward a Live Homogeneous Database of Solar Active Regions Based on SOHO/MDI and SDO/HMI Synoptic Magnetograms. I. Automatic Detection and Calibration

Recent studies indicate that a small number of rogue solar active regions (ARs) may have a significant impact on the end-of-cycle polar field and the long-term behavior of solar activity. The impact of individual ARs can be qualified based on their magnetic field distribution. This motivates us to build a live homogeneous AR database in a series of papers. As the first of the series, we develop a method to automatically detect ARs from 1996 onward based on SOHO/MDI and SDO/HMI synoptic magnetograms. The method shows its advantages in excluding decayed ARs and unipolar regions and being compatible with any available synoptic magnetograms. The identified AR flux and area are calibrated based on the cotemporal SDO/HMI and SOHO/MDI data. The homogeneity and reliability of the database are further verified by comparing it with other relevant databases. We find that ARs with weaker flux have a weaker cycle dependence. Stronger ARs show a weaker cycle 24 compared with cycle 23. Several basic parameters, namely, the location, area, and flux of negative and positive polarities of the identified ARs, are provided in the paper. This paves the way for ARs’ new parameters quantifying the impact on the long-term behavior of solar activity to be presented in the subsequent paper of the series. The constantly updated database covering more than two full solar cycles will be beneficial for the understanding and prediction of the solar cycle. The database and the detection codes are accessible online.


INTRODUCTION
Active regions (ARs) on the Sun are places where the strong magnetic field is distributed.They originate from the interior dynamo process and correspond to toroidal magnetic flux emergence.The properties of ARs' emergence and the subsequent decay can provide vital clues for large-scale dynamo models.The ARs also provide the seed field for subsequent cycles in the Babcock-Leighton (BL) dynamo framework (Babcock 1961;Leighton 1969).
In the context of the BL dynamo, the emergence and subsequent transport of tilted ARs on the solar surface account for the generation of the Sun's poloidal field, in particular its dipole component represented by the polar fields.The poloidal field is stretched by differential rotation (quasi-)linearly to regenerate the toroidal field for the AR emergence of the subsequent cycle.Hence the correlation between the minima of the polar field and the amplitude of the next cycle is expected (Schatten et al. 1978) and also confirmed by direct polar field observa-tions (Jiang et al. 2007) and polar field proxies (Wang & Sheeley 2009;Muñoz-Jaramillo et al. 2013).Hence if we can predict an AR's contribution to the end-ofcycle polar field (or axial dipole field), we can evaluate its impact on the subsequent solar activity.
ARs are always approximated as bipolar magnetic regions (BMRs) having symmetric leading and following polarities in morphology when they are involved with the study of the solar cycle.The initial contribution of a newly-emergent BMR to the axial dipole field (D i BMR ) satisfies D i BMR ∝ A 3 2 sin α cos λ (Wang & Sheeley 1991;Yeates et al. 2023), where A, α, and λ are area, tilt angle, and latitude of each BMR, respectively.
But after a BMR has emerged, meridional flow and supergranular diffusion acting in combination can cause its axial dipole field to grow or decay, depending on the BMR's emerging latitude λ (Wang & Sheeley 1991).The final contribution of the BMR to the dipole field D f BMR arXiv:2308.06914v1[astro-ph.SR] 14 Aug 2023 at the end of a cycle obeys where λ R depends on the transport processes (Jiang et al. 2014;Whitbread et al. 2018;Petrovay et al. 2020).This indicates that a single AR with a large flux and tilt angle emerging at low latitudes can have a dramatic impact on the dipole field at the end of the cycle and the further course of cyclic solar magnetic activity (Jiang et al. 2015).Although ARs emergence shows systematic properties in their latitude and tilt angle (Jiang et al. 2011), there are also strong stochastic components.At low latitudes, large ARs with large tilt angles can emerge randomly during a solar cycle.Hence they are referred to as rogue ARs (Nagy et al. 2017).From the point of view of solar cycle prediction, it is important to define a parameter describing the deviation of the dipole contribution from the case with no stochastic perturbations in AR emergence (Petrovay 2020).The parameter was first proposed by Nagy et al. (2020) as the degree of rogueness, that is ARDoR in abbreviation.However, realistically ARs are not BMRs.They have various configurations (Hale et al. 1919;Künzel 1960).Jaeggli & Norton (2016) indicate that about 30% β-type ARs observed during the years of solar maxima are appended with the classifications γ and/or δ, which usually have large areas and are hard to quantify their realistic tilt angles because of the complex multipolar configuration.Jiang et al. (2019); Yeates (2020) assimilate ARs' real magnetic configuration into their surface flux transport (SFT) simulations.The final dipole field D f BMR and the initial dipole field D i BMR does not obey Eq.( 1) anymore.The changing sign between the two parameters is common for δ-type spots.Hence the realistic AR magnetic field distribution is required to predict individual ARs' contribution to the final dipole accurately.Wang et al. (2021) further provide a quick and precise quantification of the contribution of an AR with the central latitude λ to the final dipole field D f AR instead of SFT simulations as (2) where B(θ, ϕ) is the magnetic field distribution of the identified AR, θ and ϕ are the co-latitude and longitude, respectively.Thus with B(θ, ϕ), we may predict the AR's contribution to the end-of-cycle polar field quickly.
Although a small number of rogue ARs cause large variations of the polar field, the cumulative effect of many weaker regions is also significant (Whitbread et al. 2018;Hofer et al. 2023).Hence for a better understand-ing of the solar cycle, especially the variability of the polar field, we need a complete AR catalog.Moreover, to achieve this end the database is required to be long-term and homogeneous.Constant updates of the database are also required for monitoring and predicting AR's impact on solar cycles.
The aforementioned progress in understanding the effect of individual ARs on the solar cycle motivates us to develop a live homogeneous database of ARs for a better understanding and prediction of the solar cycle.So far several AR databases are available already, for example, RGO and USAF/NOAA AR Database 1 , Bipolar Active Region Detection (BARD, Muñoz-Jaramillo et al. 2021), Space-weather HMI Active Region Patches (SHARPs, Bobra et al. 2014), Space-Weather MDI Active Region Patches (SMARPs, Bobra et al. 2021) and Sreedevi et al. (2023).But these databases just provide ARs' parameters relevant to space weather effect.Yeates (2020) gives an exceptional database, in which both the initial and final dipole field of each AR calculated from SFT simulations are offered.The two space-based instruments, SOHO/MDI (Scherrer et al. 1995) and SDO/HMI (Scherrer et al. 2012), provide continuous, seamless, and high-resolution synoptic magnetograms over solar cycles 23, 24, and 25, from 1996 to the present day.These magnetograms provide us the opportunity to build the AR database for the understanding and prediction of the solar cycle.
This paper commences a series of studies toward a live homogeneous database of solar ARs based on SOHO/MDI and SDO/HMI synoptic magnetograms.In the first paper, we develop a method compatible with any available synoptic magnetograms to automatically detect ARs.The identified AR flux is calibrated based on the co-temporal SDO/HMI and SOHO/MDI data.The homogeneity and reliability of the database are further verified by comparing it with other relevant data.Several basic AR parameters, namely, location, area, and flux of negative and positive polarities of identified ARs are provided in the paper.In the second paper, we will provide and analyze parameters quantifying the impact of individual ARs on the long-term behavior of solar activity, e.g., the final contribution to the dipole field and degree of rogueness based on the automatically identified ARs in the first paper.
This paper is organized as follows.In Section 2, we describe the algorithms for AR automatic detection.In Section 3, we calibrate the detected results based on the co-temporal SDO/HMI and SOHO/MDI data.In Section 4, we compare our data with other available ones to evaluate our method and the homogeneity of the data.In Section 5, we overview the properties of detected ARs in our database.We summarize and discuss the above results in Section 6.

Observed synoptic magnetograms
The data used in this study are radial synoptic magnetograms that are constructed from full-disk magnetograms obtained by Michelson Doppler Imager on board the Solar and Heliospheric Observatory (SOHO/MDI) (Scherrer et al. 1995) and Helioseismic and Magnetic Imager on board the Solar Dynamics Observatory (SDO/HMI) (Scherrer et al. 2012).
The sizes of MDI and HMI synoptic magnetograms are both 1440×3600 pixels.Each pixel is 0.1 Carrington longitude and about 0.00139 sine-latitude.Its area is about 1.17  1909, 1910, 1937, 1941-1946, 1956, 1986, 2004, 2005, 2011, 2015, 2052, and 2086.All missing data mentioned above are due to the SOHO spacecraft malfunction.The ARs in the maps with partially missing data are still detected but need to be used with care.

Algorithm of Automatic detection
Since ARs typically emerge in middle and low latitudes and some high-latitude data from MDI and HMI magnetograms are missing, we limit our detection to the ±60 • latitude of the synoptic maps.Our AR detection algorithm operates on unsigned magnetic fields and relies on standard image processing techniques such as morphological operations and region growing.The process of AR detection is illustrated in Figure 1 tograms generated by different instruments.The module primarily removes quiet-sun magnetic fields while retaining ARs and decayed ARs.
The second module preliminarily removes the decayed ARs and identifies AR kernel pixels.It involves a morphological closing operation and an opening operation, producing the image shown in Panel (c).The size of the closing operation kernel (K2 ) is set at 3 pixels while the size of the opening operation kernel (K3 ) is set at 11 pixels for MDI maps and 9 pixels for HMI maps.In order to get the kernel pixels of each AR, we use the opening operation to preliminarily remove the non-ARs and some branches of ARs (non-kernel pixels).Since some small regions segmented in Panel (a) may belong to the same AR, we merge them with the closing operation to prevent their removal in the opening operation.We use the detected kernel pixels of each AR with magnetic fields greater than a certain intensity threshold as seeds in the third module.
The third module employs region growing to obtain all pixels comprising each AR.This module identifies all pixels connected to the seeds with a field strength greater than the intensity threshold and generates Panel (d).In MDI magnetograms, the intensity threshold is set at 50 G, consistent with the threshold used in various AR detections (Zhang et al. 2010;McAteer et al. 2005;Virtanen et al. 2017;Yeates et al. 2007;Muñoz-Jaramillo et al. 2016).For HMI magnetograms, a threshold of 30 G is chosen by trials to ensure the consistency with the detected AR area in MDI magnetograms.Module 1 uses adaptive threshold segmentation to identify nearly all ARs and segments, so region growing recovers all possible ARs.However, the segmentation also identifies some decayed ARs, part of which are still present after Module 2. Consequently, region growing also recovers these segments.As decayed ARs typically decay into small unipolar segments, the subsequent two modules remove them based on critical area and flux imbalance, respectively.
The fourth module serves to eliminate small decayed AR segments, using an area threshold (Ta).This involves a morphological closing operation and a small region removal operation, yielding Panel (e).The area threshold (Ta) is set at 351 pixels, equivalent to about 412 Mm 2 .The magnetic flux of the smallest retained ARs is 2.42 × 10 20 Mx.Whitbread et al. (2018) find ARs greater than 5 × 10 20 Mx are enough to replicate the polar field generated by all ARs.That means the area threshold (Ta) is sufficient for retaining small ARs whose effect on the end-of-cycle polar field can not be ignored.To avoid the removal of small regions of the same AR, a closing operation is used to merge them before the small region removal.The size of the closing operation kernel (K4 ) is 5 pixels.This module removes some ephemeral regions and small decayed ARs, but the large ones are retained and will be removed in Module 5.
The fifth module is applied to remove over-decayed ARs that usually consist of unipolar regions based on their flux imbalance (Fi ) after merging neighbor regions.These unipolar regions are typically parts of decayed ARs that have been detected when they first appear on the magnetograms.They should not be detected as newly emerging flux or will affect the end-of-cycle polar field.We calculate Fi of each AR and remove the ARs with severe flux imbalance, i.e.Fi bigger than a threshold (Tf ).The flux imbalance Fi is defined as the ratio of net flux to absolute flux and calculated by . The threshold Tf is set at 0.5, consistent with the threshold in Virtanen et al. (2017); Yeates (2020) for unipolar AR removal.However, some newly emerging ARs have been separated into several regions by surface flows when they appear in synoptic magnetograms.These regions may be unipolar and should not be removed simply.To avoid this, a morphological dilation operation is applied to merge neighbor regions before removing unipolar regions.The size of the dilation operation kernel (K5 ) is 23 pixels.The resulting image is multiplied with Panel (e) to remove redundant areas, resulting in Panel (f).Removing unipolar regions effectively removes over-decayed AR segments, and merging neighboring regions ensures that some unipolar fragments of ARs in the early decay phase form a whole AR rather than being removed.Although some fragments are kept by connected to neighbor ARs, these two steps of module 5 effectively remove the overdecayed ARs.
Panel (f) exhibits a binary image of the identified ARs.By multiplying this panel with the initial magnetogram, Panel (a), the magnetic field distribution of the detected ARs is obtained, as demonstrated in Panel (g).
Figure 2 shows examples of AR detections for synoptic magnetograms from MDI (left panel) and HMI (right panel) taken at the maximum phase and the minimum phase of solar cycles 23 and 24.The algorithm is able to identify all ARs in both MDI and HMI synoptic magnetograms and effectively removes most over-decayed AR segments.While the algorithm is developed using MDI and HMI synoptic maps, it can be adapted to detecting other available synoptic maps by adjusting the parameters of the detection modules.

Controlling Parameters of the algorithm
The algorithm detects the magnetic field distribution of all ARs.For a single synoptic magnetogram, the AR magnetic field distribution is characterized by three parameters: AR number, AR area, and AR total unsigned flux.These parameters are influenced by the eight controlling parameters of the five modules, which have been introduced in Subsection 2.2 along with their optimized values.To evaluate the impact of each controlling parameter on the AR detection, we use the MDI synoptic magnetogram of CR 1968 as an example, and the results are presented in Table 1.For each parameter, we provide the upper and lower limits of its acceptable range in the table.The acceptable range of each parameter is not strictly constrained here and values beyond it could also work well.
Due to the significant impact of small decayed ARs, noticeable differences (Diff ) of AR number within the acceptable range can be observed for certain parameters.However, Diff of AR area and unsigned flux, when compared to the results obtained with the optimized value, are generally below 10% for most parameters.By analyzing the acceptable range and Diff of each parameter, we find that different parameters have varying effects on the algorithm.The algorithm exhibits insensitivity to many parameters, particularly Ta in Module 4. Conversely, it displays relative sensitivity to the closing kernel (K2 ) and opening kernel (K3 ) in Module 2, as well as the closing kernel (K4 ) in Module 4, where Diff in AR area or flux exceeds 10% or the acceptable ranges are small.It should be noted that although only the result of the MDI synoptic magnetogram in CR 1968 is listed in Table 1, we have conducted additional tests using the MDI map of CR 2070 and HMI maps of CR 2155 and 2215, all of which yield similar conclusions.
The reasons for the different sensitivities of the controlling parameters are as follows.For Ta in Module 4, it is used to remove small decayed AR segments that are typically unipolar and would be removed in Module 5 unless they are connected to other ARs.Therefore, the effect of Ta on the results is extremely slight.In Module 2, kernel pixels of ARs are obtained while decayed ARs are removed.The closing kernel (K2 ) and opening kernel (K3 ) affect the distinction between ARs and decayed ARs, and the result of region growing in Module 3 further.The closing kernel (K4 ) in Module 4 controls the connection of different AR segments and affects the removal of small isolated regions and unipolar regions.Since K2, K3, and K4 not only affect the results of each operation but also strongly affect the results of the following operations, the detection result is more sensitive to them.For HMI maps, the acceptable range of K3 is 7 -11 pixels and the optimized value is 9 pixels, different from that of the MDI maps.The reason for the difference will be given in Section 3.

CALIBRATION AND COMPARISON OF RESULTS FROM SOHO/MDI AND SDO/HMI CO-TEMPORAL MAGNETOGRAMS
To ensure the homogeneity of the AR detection results obtained from MDI and HMI synoptic magnetograms, we perform two calibration processes using the co-temporal magnetograms in CRs 2097-2107.First, we calibrate the controlling parameters used for MDI and HMI maps.When using the parameters for MDI magnetograms, we find that the AR number and area detected in HMI maps are smaller than those detected in a There are three values for each parameter, the lower limit, the optimized value, and the upper limit, listed from top to bottom.The acceptable range for each parameter spans from the lower limit to the upper limit.The parameter Diff refers to the difference of detection result within the acceptable range, given by Dif MDI maps.To obtain consistent results from the two different instruments, we reduce the opening kernel (k3 ) in Module 2 from 11 pixels to 9 pixels and decrease the threshold for region growing from 50 G to 30 G. The results have been shown in Subsection 2.2.After calibrating the controlling parameters, we find that the detected AR number and area of co-temporal MDI and HMI synoptic maps are consistent, which will be illustrated by Figure 4.
Second, we compare the area and unsigned flux of ARs detected in both MDI and HMI synoptic magnetograms to calibrate AR flux and show the effect of the parameter calibration further.In the overlap period, 67 ARs are detected in MDI maps while 64 ARs are detected in HMI maps.Different AR numbers are due to their different resolutions and magnetic field strengths, which can not be calibrated by adjusting the controlling parameters.However, we find that 56 ARs are detected in both maps, which account for over 80% of the total ARs detected.This is similar to the 57 ARs identified by NOAA (Bobra et al. 2021).
Based on the comparison results shown in Figure 3, the areas detected in MDI and HMI maps are highly consistent, which shows the accuracy of the parameter calibration.However, there is a noticeable difference in the flux measurements, with MDI measurements showing a larger flux for the same AR compared to HMI measurements.The slope of the fitting function is approximately 1.36.That is similar to the results of Liu et al. (2012), who find the scaling factor for fields stronger than 600 G is 1.31, for fields weaker than 600 G is 1.44, and for all pixels is 1.40.
According to the calibration above, we scale the AR flux detected in HMI maps by multiplying it with a factor of 1.36.Figure 4 displays the results for AR number, area, and calibrated flux from co-temporal MDI and HMI magnetograms.Although the AR number differs in some CRs, the detection results for AR area and flux are highly consistent between MDI and HMI maps, except CRs 2104 and 2106.Some data are missing in MDI synoptic maps of CRs 2105 and 2106.The missing data in CR 2105 is mainly weak fields without any ARs while it contains an AR in CR 2106.As a result, no remarkable difference is found in CR 2105 but all three parameters are different in CR 2106.For CR 2104, two small ARs and one decayed AR are detected in the HMI map but not in MDI, although the effects on area and flux are slight.Figure 4 indicates that some differences still exist for the identified result based on the MDI and HMI maps after calibration, which affects the AR number but only slightly affects the AR area and flux.We note that it is the AR area and flux that are important for our objective, that is the research of solar surface magnetic field, while the AR number is relatively insignificant.
The above calibration methods, i.e. calibration of controlling parameters and scaling of AR flux, are also required for synoptic magnetograms from other instruments when the algorithm is applied.

COMPARISON WITH OTHER DATABASES
Applying the automatic detection method to MDI and HMI synoptic magnetograms and calibrating the detection results between them, we generate a homogeneous database of ARs covering cycles 23, 24, and part of 25.To demonstrate the validity, homogeneity, and advantage of the method and database, we first compare the detection result of one CR map with Zhang et al. (2010) and NOAA in detail and then compare the AR number, area, and flux in cycles 23 and 24 with more databases.For the research of solar surface magnetic field evolution, we should not include decayed ARs, which are supposed to be included in the data already when they newly emerge.Meanwhile, we detect ARs No. 1 and 8 that are not identified by them.Part of AR No. 8 is also detected by NOAA, indicating the validity of our detection.Additionally, our results are more complete than Zhang et al. (2010) for commonly detected ARs No. 3, 5, and 12.The corresponding ARs identified by Zhang et al. (2010) are nearly unipolar regions, while we detect them entirely by obtaining two polarities of each AR in Module 1 and merging them into a single one in Module 5.
Compared with NOAA, our method detects 14 ARs while theirs detects 19 ARs.The different AR number results from two aspects.One is that some NOAA ARs are decayed ARs that are removed by us.This further indicates the property of our method.The other is that ARs No. 10 and 11 correspond to more than one NOAA AR, respectively.This means different methods to group ARs when they are crowded over the solar surface.The different methods affect the identified AR number, but not the total area and flux, which are essential parameters for our objective.AR No. 1 is not detected by NOAA.Because it has a small flux 1.82×10 21 Mx, which might not have mature sunspots (Cho et al. 2015).For our objective, such small magnetic regions are important (Whitbread et al. 2018;Hofer et al. 2023).Except for ARs No. 10, 11, and 1, most ARs detected by us correspond to one NOAA AR.
In total, our detection is similar to Zhang et al. ( 2010) and NOAA.However, our detection shows its advantage in properly obtaining two polarities of ARs and removing over-decayed ARs, especially unipolar regions.

An overall comparison of the results in cycles 23 and 24
To further test our database in a long time range, we compare the detection result in cycles 23 and 24 with monthly mean sunspot number, USAF/NOAA sunspot number and area, Whitbread et al. (2018) (hereafter WYM), BARD (Muñoz-Jaramillo et al. 2016;Muñoz-Jaramillo et al. 2021), and SMARPs and SHARPs (Bobra et al. 2021, 2014).USAF/NOAA observes sunspot groups daily and contains multiple records for a sunspot group, so we select each sunspot group at its maximum development of the area.Based on NSO LOS synoptic maps, WYM applies Gaussian smoothing and intensity threshold segmentation (Yeates et al. 2015) to get AR properties between CR 1641 and CR 2196, covering cycles 21-24.BARD uses morphological analysis and limited human supervision on magnetograms of NSO (1996NSO ( -1999)), SOHO/MDI (1996-2011), and SDO/HMI (2010-2016)  observed over the last two solar cycles, from 1996 to the present.
The comparison results are presented in Figure 6.All databases' AR (sunspot) number, area, and flux are calculated for each CR and smoothed with nine CRs.The gap in our data and BARD in 1998 is due to the SOHO spacecraft malfunction.The values of the three parameters of all databases in cycle 23 are adjusted to a similar strength for comparison.
For the AR number, our database is smaller than those using full-disk maps (USAF/NOAA, BARD) because synoptic maps have a limited time resolution, which makes some small ARs that are present in full-disk magnetograms not available.However, the AR number of our database is greater than that of WYM, which also uses synoptic maps, because their method tends to merge more ARs into one AR than ours.Besides, our method is more sensitive to small ARs and can detect some small ARs that are not detected by their method.It is noted that these small ARs should not be disregarded because the impact of numerous small ARs on the end-of-cycle polar field is also significant (Whitbread et al. 2018;Hofer et al. 2023).For the AR area, our database is greater than BARD because BARD applies a bigger threshold than us in detection, which removes AR pixels with a relatively weak magnetic field.AR area is not provided in WYM.USAF/NOAA sunspot area is severely smaller than our AR area, which is reasonable because ARs usually cover sunspots and their nearby faculae which are much larger than spots (Chapman et al. 1997).
For the AR flux, the consistency of our database with BARD through the whole time period is highly significant, although our database is smaller in strength.One of the reasons is the different phases of AR that we choose.ARs in BARD are taken at the moment of maximum flux, while ours are taken at the moment of crossing the Central Meridian.In comparison, WYM is notably weaker than both ours and BARD in total, although it exhibits relatively stronger fluxes around 1999.This could be due to the use of different synoptic magnetograms: NSO by WYM versus MDI and HMI by us and BARD.
To assess the homogeneity of our database, we conduct a comparative analysis of the ratios of cycle 24 to cycle 23 for the three parameters among the aforementioned databases.The ratios are calculated using the peak values in two cycles for all parameters and they are presented in Table 2.Although Figure 6 does not display the ratios for SMARPs and SHARPs, we include their ratios in Table 2 to ensure a comprehensive comparison.
We find the ratio of AR number in our database surpasses that of the other databases.This discrepancy can be attributed to the merging of neighboring regions in Module 5, which serves to combine separate regions into a single active region but also merges closely lo-cated active regions into one.Given the considerably stronger activity during cycle 23, a larger number of active regions merge, resulting in a slightly higher ratio.On the other hand, the ratios of AR area and flux in our database are found to be situated in the middle range among the ratios given by other databases and the ratios of area and flux are almost identical.This similarity is consistent with the well-known linear correlation between AR area and AR flux (Sheeley 1966), implying the homogeneity of our database in terms of area and flux.Compared to the sunspot number, our database exhibits a higher ratio of AR number, while the ratios of AR area and flux are comparatively lower.The other databases also exhibit similar results, except NOAA for their detection according to sunspots.Furthermore, the ratio of AR number exceeds the ratios of AR area and flux both in our database and other databases.To understand the cause of different ratios in number and area and whether they suggest the inhomogeneity of our database, we analyze ARs with different strengths in cycles 23 and 24.According to the unsigned flux of ARs, we separate ARs into three categories, strong ARs (|f lux| > 10 22 M x), medium ARs (4 × 10 21 < |f lux| < 10 22 M x), and weak ARs (|f lux| < 4 × 10 21 ).The thresholds for ARs of different strengths are set according to Wang & Sheeley (1989).Among 2579 ARs, there are 1266 strong ARs, 640 medium ARs, and 673 weak ARs.
The number, area, and flux of the three categories are shown in Figure 7 and the ratios of cycle 24 to cycle 23 in different parameters are shown in Table 2.The area ratios keep the same as the ratios of the flux for all three categories, but the difference between the area and number varies for them.The ratios of area and number exhibit significant disparities for strong ARs, while they are similar for medium and weak ARs.In terms of area and flux, cycle 24 is even more than half weaker than that of cycle 23 for the strong ARs.With the decrease of the ARs' strength, the difference between the two cycles in area and flux decreases.For weak ARs' area and flux, cycle 24 is almost the same as that of cycle 23.The number ratios demonstrate a similar trend.Cycle 24 even has slightly more weak ARs than cycle 23.In total, the ratio of total AR numbers is 82% which is between the ratios of strong ARs and weak ARs.However, the ratios of total AR area and unsigned flux are similar to that of strong ARs because weak and medium ARs contribute little to the total area and flux.
de Toma et al. ( 2013) also find there is a notable decrease in the number of large sunspots during cycle 23 compared to cycle 22.In contrast, the numbers of small sunspots remain relatively consistent between the two cycles.Our results, in alignment with the findings of de Toma et al. (2013), indicate that the variations across different solar cycles are primarily driven by the strong active regions (large sunspots), while weak active regions (small sunspots) exhibit minimal changes in response to cycle strength.Moreover, our results indicate that the different ratios in terms of AR area and number between cycle 24 and cycle 23 are primarily affected by the strong ARs and are unlikely to signify inhomogeneity within our database.They are the intrinsic property of the solar cycle, which implies that the relative strength of solar cycles varies with different activity indices.The widely adopted cycle strength is based on the synthetic index of sunspot number.We will further verify the index dependence of cycle strength using the sunspot area data since 1874 and other historical datasets in a forthcoming study.

BASIC AR PARAMETERS OF THE DATABASE
The automatic detection method detects the magnetic field distribution of each AR.Based on that, parameters characterizing each AR can be calculated.As the first paper, we just provide several basic parameters, including the latitude and longitude of the flux-weighted centroid of two polarities and the whole AR, area, and flux of each polarity.The CR number and label jointly serve to identify a unique AR.Table 3 lists  The detection algorithm and full database of detected ARs are freely accessible in https://github.com/Wang-Ruihui/A-live-homogeneous-database-of-solar-activeregions.

CONCLUSION AND DISCUSSION
In order to provide a homogeneous AR database for the understanding and prediction of the solar cycle, we propose a new method to automatically detect ARs from MDI and HMI synoptic magnetograms, calibrate the detections from MDI and HMI maps, and provide several basic parameters of the detected ARs.
Our method for AR detection is based on morphological operations and region growing.It can process synoptic magnetograms with varying magnetic field strengths and maps from different instruments.It is able to identify all possible ARs and remove unipolar regions.Unipolar regions are typically part of decayed ARs and should be excluded from our database.Otherwise, they will erroneously impact the analysis of the end-of-cycle polar field.Compared to Zhang et al. (2010) and NOAA, our detection is similar to them overall but our method shows its advantage in properly detecting two polarities of ARs and removing the unipolar regions.
To obtain a homogeneous AR database, we apply calibrations to the detections from MDI and HMI synoptic magnetograms, specifically adjusting the controlling parameters and the detected unsigned flux.Through a comparative analysis of ARs detected on both maps during the overlap period, we find that the AR flux in MDI maps is approximately 1.36 times that of HMI maps, consistent with the calibration of Liu et al. (2012).
When compared to other databases such as the sunspot number, USAF/NOAA sunspot number and area, Whitbread et al. (2018), BARD, SMARPs, and SHARPs, our database exhibits a similar trend of the time evolution of AR numbers, areas, and unsigned flux in cycles 23 and 24.However, the ratios of cycle 24 to cycle 23 differ among these databases for all three parameters.Specifically, our database and most others exhibit a higher ratio of AR number compared to the widely used sunspot number, while the ratios of AR area and flux are relatively lower.Additionally, the ratio of AR number consistently surpasses the ratios of AR area and flux across our database and the other databases.Through our analysis of ARs with different strengths in cycles 23 and 24, we find that the distinct ratios in AR number, area, and flux are primarily influenced by the strong ARs, while the weak ARs show similar ratios for the AR number and area.This indicates that the different ratios are not caused by the inhomogeneity within our database, but show that the strength of the solar cycle varies with different indices of solar activity.Furthermore, our analysis suggests that weaker ARs exhibit weaker dependence on the solar cycle, and the difference in the strength of cycles 23 and 24 is primarily caused by strong active regions.
Although several rogue ARs significantly affect the end-of-cycle polar field, the effect of small ARs can not be ignored.Their contribution to the polar field is even comparable to that of big ARs (Hofer et al. 2023).However, according to the study of Whitbread et al. (2018), ARs greater than 5 × 10 20 Mx are enough to replicate the polar field generated by all ARs.The weakest AR in our database is 2.42 × 10 20 Mx.It means that our database contains enough small ARs for the research of surface magnetic field evolution and solar cycle prediction.Besides the advantages presented above, there are still limitations of our database so far.The ARs in our detection are not in their fully emerged phase.It is attributed to the time resolution of synoptic magnetograms and the absence of the observation of the far side of the sun.Some ARs are detected repeatedly because they appear in several synoptic maps due to their long lifetimes.In addition, our database just presents several basic parameters now.In the subsequent study, we will remove the repeated ARs properly, and provide and analyze more useful parameters, particularly the final dipole field that quantifies the impact of ARs on the end-of-cycle polar field.Additionally, we will continuously update the database based on newly released synoptic magnetograms.
Mm 2 .The time range of MDI and HMI magnetograms is from CR 1909 (May 1996) to CR 2096 (May 2010) and from CR 2097 (June 2010) to CR 2265 (December 2022), respectively, except for CRs 1938-1940 when the MDI maps are totally missing.Besides, the data of MDI maps are partially missing in CRs

Figure 1 .
Figure 1.Illustration of the AR detection algorithm.The synoptic map of CR 1968 is used as an example.Panel (a) is the original map with the ARs detection result contoured in orange lines.Panel (b): Module 1, adaptive intensity threshold segmentation; Panel (c): Module 2, morphological closing operation and opening operation; Panel (d): Module 3, region growing; Panel (e): Module 4, morphological closing operation and removing small regions; Panel (f): Module 5, merging neighbor regions and removing the unipolar regions.Panel (g) shows the final detection result of AR.

Figure 2 .
Figure 2. Examples of the detected ARs based on synoptic magnetograms.MDI (left) and HMI (right) synoptic maps at the maximum phase (upper) and the minimum phase (lower) of cycles 23 and 24 are used.The four magnetograms are overplotted with the lines in orange outlining the profiles of the detected ARs.
lower limit, the optimized value, and the upper limit of the three parameters, i.e. number, area, and unsigned flux.The parameters K1, K2, K3, K4, K5 and Ta are in the unit of pixel, and C is in the unit of Gauss.b Unsigned flux of detected ARs.

Figure 3 .
Figure 3.Comparison of the area and flux of ARs detected by both MDI and HMI synoptic magnetograms during the overlap period (CRs 2097-2107).Top (Bottom): scatter plot between MDI AR area (unsigned flux) and HMI AR area (unsigned flux).

Figure 4 .
Figure 4. Comparison of detected ARs during the 11 HMI and MDI overlap CRs (CRs 2097-2107) after the calibrations of detection parameters and the AR flux.From top to bottom are the evolutions of the detected AR number, area, and unsigned flux.

Figure 5 .
Figure 5.Comparison of our results (bottom panel) with Zhang et al. (2010) and NOAA AR (top panel) of CR 2000.The top panel is reproduced with permission from Zhang et al. (2010), copyrighted by the American Astronomical Society.ARs identified by Zhang et al. (2010) are in black boxes.NOAA ARs are labeled by red circles with plus symbols at the center indicating the centroids.The yellow symbol denotes the non-spot plage region.The bottom panel is overplotted with the contours in orange outlining the border of the detected ARs and numbers labeling them.

Figure 6 .
Figure 6.Comparison of our results (black) with other databases in the number (top), area (middle), and flux (bottom).13-month smoothed monthly total sunspot number: red; USAF/NOAA sunspot: orange; WYM: green; BARD: purple.Each parameter of these data is multiplied by a proper factor shown in the legend for comparison.All AR data are smoothed over nine CRs.

4. 3 .Figure 7 .
Figure 7. Statistical properties (Number: top; Area: middle; Unsigned Flux: bottom) of the detected ARs with different strengths of flux in cycles 23 and 24.The area and flux of medium AR and weak AR are multiplied with a factor shown in the legend for comparison with strong AR.

Figure 8 .
Figure 8. Butterfly diagram of ARs of our database in cycles 23, 24, and part of 25.The color shows the average area of ARs.There is a gap in 1998 due to the missing synoptic magnetograms in CRs 1938-1940.

b
All longitudes and latitudes are in the unit of degree.The symbol '+' refers to the positive polarity of the AR.cThe symbol '-' refers to the negative polarity of the AR.

d'
Wholes' refers to the whole AR.

Table 1 .
Effects of the controlling parameters in the five modules of the AR detection algorithm on the detected AR number, area, and flux.MDI synoptic magnetogram of CR 1968 is taken as an example.Number Area (mHem) USFlux (10 23 M x) b

Table 2 .
Ratios of cycle 24 to cycle 23 in different parameters Databases used for comparison in Subsection 4.2 b ARs with different strengths of our database.They are described in detail in Subsection 4.3. a

Table 3 .
AR database aTable3is published in its entirety in the machine-readable format.The ARs in the CR 1968 synoptic map are shown here for guidance regarding its form and content. a