Reconstructing Intrinsic Stellar Noise with Stellar Atmospheric Parameters and Chromospheric Activity

Accurately characterizing the intrinsic stellar photometric noise induced by stellar astrophysics, such as stellar activity, granulation, and oscillations, is of crucial importance for detecting transiting exoplanets. In this study, we investigate the relation between the intrinsic stellar photometric noise, quantified by the Kepler combined differential photometric precision (CDPP) metric, and the level of stellar chromospheric activity, indicated by the S-index of Ca ii H K lines derived from LAMOST spectra. Our results reveal a clear positive correlation between the S-index and robust rms values of CDPP, with the correlation becoming more significant at higher activity levels and on longer timescales. We have therefore built an empirical relation between the robust rms values of CDPP and the S-index as well as T eff, log g, [Fe/H], and the apparent magnitude, with the XGBoost regression algorithm, using the LAMOST–Kepler common star sample as the training set. This method achieves a precision of ∼20 ppm for inferring the intrinsic noise from the S-index and other stellar labels on a 6 hr integration duration. We have applied this empirical relation to the full LAMOST DR7 spectra database and obtained the intrinsic noise predictions for 1,358,275 stars. The resultant catalog is publicly available and expected to be valuable for optimizing target selection for future exoplanet-hunting space missions, such as the Earth 2.0 mission.


INTRODUCTION
Activities caused by surface magnetism are prevalent on stellar surfaces, which provides us probe to study the dynamo mechanism of stellar interiors and further contribute to our understanding of their roles playing in stellar evolution (see Charbonneau & Sokoloff 2023, for a review).However, magnetic activities hold negative msxiang@nao.cas.cnjie.yu@anu.edu.ausjr@nao.cas.cninfluences in the exploration of exoplanets, especially in the search for Earth-analog exoplanets, and further in exoplanetary system environments (e.g.Cegla 2019;Hatzes 2019).Activity-induced radial velocity variations can mimic or conceal the Doppler signatures of orbiting planets, resulting in difficult even false detections of exoplanets with the Doppler method (see Meunier 2021, for a review).With the transiting method to search for exoplanets, stellar activities are also found to be one of the sources in the higher than expected noise of the Kepler photometric time series (Gilliland et al. 2011(Gilliland et al. , 2015)).
Given that the non-stationary noise has impact on the detectability of the transit signature of the candidate, the noise levels over the 14 integration duration (i.e.[1.5, 2.0, 2.5, 3.0, 3.5, 4.5, 5.0, 6.0, 7.5, 9.0, 10.5, 12.0, 12.5, 15.0] hr) are determined by Kepler 's transiting planet search (TPS) pipeline module (Jenkins et al. 2010;Tenenbaum et al. 2012) for each light curve in the Kepler photometric time series.The noise metrics are referred to as the combined differential photometric precision (CDPP), which are intended to be either the observed noise in a specified temporal domain or the predicted noise level in the same temporal domain from rolling up all contributing factors(see Christiansen et al. 2012, for detailed definition).The early Kepler on-orbit results (e.g.Christiansen & Machalek 2010) showed that the CDPP at the nominal 6.5 hr (one-half the duration for a central transit of a true Earth-analog) for solartype stars of 12th in the Kepler band magnitude are 30 parts per million (ppm), which are commonly 50% higher than expected in the initial plan (Jenkins 2002).The transit signals of earth-sized planets and smaller ones are thereby probably heavily drowned in noise.
It is necessary to perform an in-depth analysis of the noise properties, from which we may achieve a better understanding of the stellar noise and facilitate future planet-search missions.The properties of Kepler photometric noise have been studied systematically (e.g.Gilliland et al. 2011Gilliland et al. , 2015)).By analyzing the early release of data from Quarters 2 to 6 in 2009-2010, Gilliland et al. (2011) ) showed that the Kepler observed noise can be decomposed into a few terms: fundamental terms (Poisson and readout noise), added noise due to the instrument and that intrinsic to the stars.Among them, the intrinsic stellar noise mainly due to stellar activity turned out to be the major contributor to CDPP, which strongly deviates from expectations since this term is twice the budgeted value (Jenkins 2002).Considering data spanning 4 years, Gilliland et al. (2015) revisited a similar analysis of the noise observed by Kepler.On one hand, they found that the instrumental noise levels have dropped with the inclusion of more data, particularly the ones that have been processed recently.On the other hand, they showed that the intrinsic stellar noise levels have remained almost unchanged with the adoption of the newer data release, which is reasonable since updates to the pipeline cannot remove the intrinsic stellar noise.
There are still unanswered questions over how would stellar activity be related to photometric noise, or rather, to intrinsic stellar noise?And, is there a way to predict photometric noise and intrinsic stellar noise level based on the level of stellar activity?The answers to these questions are crucial for prioritizing targets in future Earth-analog exoplanet searches, such as the Earth 2.0 (ET) space mission (Ge et al. 2022a,b;Zhang et al. 2022), a proposed Chinese space mission to detect thousands of small/low-mass exoplanets over a wide range of orbital periods.With the spectroscopic observations by the LAMOST survey (Zhao et al. 2006;Cui et al. 2012;De Cat et al. 2015;Yan et al. 2022), we can measure stellar chromospheric activity for millions of stars.This will facilitate us to investigate the dependence of photometric noise on stellar activity and further tackle these questions.
In this work, we focus on FGK-type stars to investigate the impact of chromospheric activity on intrinsic noise.We then predict the noise by accounting for chromospheric activity as well as stellar fundamental parameters for stars observed by the entire LAMOST survey.In Section 2, we explain the sample selection.In Section 3, we analyze the relation between stellar activity levels and stellar intrinsic noise for solar-type stars.We predict noise level for stars in the LAMOST field using machine learning algorithms in Section 4. Section 5 and Section 6 are discussion and conclusion, respectively.

STELLAR SAMPLES
We use the Kepler Stellar Properties Catalog for Q1-Q17 DR25 Transit Search which includes robust rms of the CDPP (hereafter, rrms CDPP) values over different integration durations from 1.5 to 15 hours (Mathur & Huber 2016).The one-half the duration for a central transit of a true Earth-analog is nearly 6.5 hours.Accordingly, we divide the accessible 6-hr rrms CDPPs by (13/12) 0.5 = 1.041 to approximate 6.5-hr rrms CDPPs.
We derive stellar parameters, effective temperature (T eff ), surface gravity (log g) and metallicity ([Fe/H]) from the LAMOST DR7 low-resolution spectra1 using the data-driven Payne (DD-Payne) (Xiang et al. 2019).For our purposes, we restrict attention to FGKtype stars with T eff in the range 3800 ∼ 6500 K. Stars hotter than 6500 K that generally fall into the classical instability strip are excluded.In addition, both eclipsing binaries (Kirk et al. 2016) and Kepler objects of interest hosting planet candidates being flagged in DR25 (Thompson et al. 2018) are excluded.
To derive the activity proxy, i.e. S -index of Ca ii HK lines, we then consider stars with signal-to-noise ratios (S/Ns) of the LAMOST spectra higher than 50 to place a lower limit on the quality of the spectroscopic observations.Our final sample includes 39,056 stars.Following methods described in Zhang et al. (2020a), we  -3.106706 -4.083044 -4.050601 -4.998701 -5.231168 -5.231163 -5.210254 -5.258316 -4.977679 -4.682999 -3.927053 p4 9.652102 12.633736 12.711457 15.574175 16.361260 16.343797 16.418002 16.583189 15.767899 14.908123 12.679827 calculate S -index as the ratio of the integrated fluxes in the cores of Ca ii H and K lines to that in the nearby pseudo-continuum.To perform the integration, we use a triangle function with a full width at half-maximum (FWHM) of 1.09 Å centered at 3968 Å and 3934 Å for H and K lines, respectively.While for the nearby pseudocontinuum, we use a rectangular function with a width of 20 Å centered at 4001 Å and 3901 Å.
Repeat measurements for common stars in LAMOST observations can provide a good estimate for the internal precision of our S -index measurements.In our sample, there are 5837 stars that have been observed by LAM-OST at least three visits.For 84% of these stars, the S-index scatter among individual measurements is below 0.02 dex.To better understand possible systematics in the S-index measurements, we have also implemented an external comparison with literature S-index from highresolution spectra.A detailed discussion about the re-sults of the external comparison is presented in Sect. 5. Briefly, both a systematic trend and considerable scatter are present between our measurements and literature.However, as the effectiveness of our data-driven method of inferring stellar intrinsic noise is mostly determined by internal consistency rather than the absolute scale of the S-index measurements, we expect our results presented in the current work to be robust, given the small internal errors in our S -index measurements.

determining intrinsic stellar noise
In order to quantify the intrinsic stellar noise of stars, we first subtract the photon noise and readout noise from the CDPP.It has been shown that the lower bound on the distribution of rms CDPP versus Kepler magnitude is the minimum noise floor, with contributions from both photon noise, which is a pure Poisson noise and depends only on the magnitude, and the typical readout noise (Christiansen et al. 2012).Thus, we determine the lower bound by applying a 4th order polynomial fit to the rrms CDPP at the bottom 0.5-th percentile of points within 0.2-magnitude-wide bins as a function of stellar apparent magnitude in the Gaia DR3 G bandpass (hereafter, G, Jordi et al. 2010).The polynomial function is given by min (1) where rrmsCDPP k is the observed overall noise on the k-hr integration duration, p 0 , p 1 , p 2 , p 3 , and p 4 are the coefficients of the polynomial terms.Figure 1 illustrates an example of the polynomial fit to the entire sample of Kepler targets with 6.5-hr integration duration.Given the primary range of stellar magnitude in Kepler band is Kp = 9 ∼ 15 (Koch et al. 2010), the fit applied for all Kepler targets with G in the range of 8 ∼ 16 which covers our sample stars of interest.We perform similar polynomial fits to determine the Poisson noise over 11 integration durations (i.e.[1.5, 2.5, 3.5, 5.0, 6.0, 6.5, 7.5, 9.0, 10.5, 12.0, 15.0] hr).The corresponding coefficients of the polynomial terms are listed in Table 1.The polynomial relations are then subtracted in quadrature from rrms CDPP measures.
According to Gilliland et al. (2015), instrumental noise can be approximated by 13% of squared rrms CDPP.We thus estimate the intrinsic stellar noise by the following form rrmsCDPP 2 intrinsic = rrmsCDPP 2 − (10 min(log rrmsCDPP) ) 2 − 0.13 rrmsCDPP 2 . (2) We note that taking a constant fractional value of 13% as the instrumental noise is a simplified approximation, while a more realistic estimate for the Kepler instrumental noise may need to consider possible variation with detector channels.For the current work, we do not expect such an approximation would cause a dramatic problem, given the relatively small contribution of this term compared to the photometric noise and the intrinsic noise.However, a further, detailed characterization of the Kepler instrument noise will be very helpful for future studies.

relation between intrinsic noise and stellar activity for solar-type stars
To explore the relation between stellar activity and intrinsic noise, we first focus on solar-type stars with T eff , log g and [Fe/H] in the ranges 5677-5877 K, 4.34-4.54and −0.1-0.1, respectively.We also limit the sample to a narrow magnitude range (K p = 11.5 ∼ 12.5) to avoid introducing dependence of noise on magnitude.
Figure 2 shows the distribution of intrinsic rrms CDPP for different integration durations as a function of S -index for the 77 selected solar-type stars.It clearly shows that the intrinsic rrms CDPP in each integration duration panel is positively correlated with S -index.The Spearman's rank order correlation coefficient r s are 0.45, 0.72, and 0.79 on 1.5-hr, 6.5-hr and 15.0-hr timescales, respectively.That is, the correlations are more significant for longer integration duration.The reason for this is the fact that behavior following from magnetic activity and rotation of solar-type stars are better elucidated at longer timescales (Basri et al. 2013).This result is compatible with Gilliland et al. (2011) who showed that the activity dominates at high stellar noise for solar-type stars based on the synthetic population.
The bottom-left panel of Figure 2 also shows the location of the Sun in the S-index and intrinsic noise plane.The solarS -index is a mean of measurements by LAM-OST during activity cycles 15-24 (Zhang et al. 2020b), while the solar intrinsic noise is taken from Gilliland et al. (2011).It can be seen that the Sun follows the same relation as other solar-type stars but has the lowest intrinsic noise and activity level, which might be a factor making it in place to hold a habitable planet.
To give a more explicit picture, we divide S -index, whose values span from 0.16 to 0.28, into 6 bins and then derive the average S -index as well as the median values of the intrinsic rrms CDPP in each bin for all the data sets shown in Figure 2 (the red dots symbols).We further plot these median intrinsic rrms CDPP versus the average S -index on different timescales in Figure 3.It can be found that the median intrinsic rrms CDPP on all timescales distribute mainly in the range of 20-30 ppm at the low-activity end, with the 1.5-hr duration slightly higher than others.Moreover, the dependency rises with increasing S -index and becomes significant for stars with S -index higher than ∼ 0.22.At the highactivity end, the noise levels on different timescales separate from each other from 40 to 90 ppm, with the 15-hr duration noise being twice that of 1.5-hr.This remarkable difference of intrinsic noise on the different duration timescales manifests the larger contribution induced by stellar activity to stellar noise on longer timescales, as we mentioned previously.This result could provide valuable guidance to select targets for Earth-2.0 search in future missions.As a reference, the inactive stars with S -index < 0.22, which corresponds to an intrinsic noise lower than ∼ 30 ppm, should be favored to increase the success rate of Earth 2.0 detection.
We note the results are demonstrated based on solartype stars with activity levels estimated from LAMOST low-resolution spectra.The S -index value is subject to variation due to the resolving power of different instruments (as the lower the resolving power is the stronger the line cores are mixed with wings and, consequently, the larger are the Ca ii H & K fluxes).We inspect the influence of different resolving powers of instruments on our results in Sect. 5. We collect stars in common with S -index measured both from the LAMOST low-resolution spectra and other high-resolution spectra.After calibrating S -index from the LAMOST scale to the conventional scale of Mount Wilson Observatory (hereafter, MWO) HK Project (Wilson 1978), we find the increasing trends of stellar noise with activity levels are still significant except that the inflection point of S MWO extends slightly beyond 0.22 on the LAMOST scale.The median intrinsic rrms CDPP distribution below the inflection point is mainly concentrated in the 20-30 ppm range on timescales shorter than 12 hours (see Figure 10).
The dispersion shown in Figure 2 could be caused by the intrinsic variability of stellar activities.For solartype stars, the variations of S -index values due to the intrinsic variability of activity are expected to be similar to those of the Sun (i.e., about 10% from the mean value) (Egeland et al. 2017).On the other hand, the different ages of these stars could also lead to variations of the S -index values.According to Zhang et al. (2020b), solar-type stars with near-solar rotation periods have chromospheric activities that are systematically higher than stars with undetected rotation periods, which, in one aspect, reflects the discrepancies in their ages.Sect.3.2 has shown the significant impact of stellar activity on intrinsic stellar noise by focusing on the solartype star sample.The correlation between stellar activity and intrinsic noise is expected to be a ubiquitous effect for all stars with a radiative core and a convective envelope.Thus, we use the machine learning method to quantitatively characterize the relationship between the intrinsic stellar noise and stellar properties, i.e., stellar activity (S -index), atmospheric parameters (T eff , log g, [Fe/H]), and magnitude (G): (3) in which T eff is used in its logarithmic form.This is important for target selections in transiting planet search missions if the noise of targets is known from these parameters.
We predict individually the intrinsic rrms CDPPs on three representative timescales, i.e. 1.5-hr, 6.0-hr, and 15-hr.Since the 6.5-hr noise metrics are estimated by the accessible 6-hr noise metrics (see details in Section 2), we predict the 6-hr noise metrics instead of 6.5-hr noise metrics.We also predict the observed overall rrms CDPPs using the same stellar properties for comparison.Note that the overall rrms CDPPs and intrinsic rrms CDPPs are predicted in their logarithmic form given they span several orders of magnitude.
For each noise metric, we apply the XGBoost regression algorithm to build a model.The XGBoost algorithm is an amelioration of the gradient boosting method, and it also leverages poor predictors by combining them together in a way that maximizes their predictive power (Friedman et al. 2000;Friedman 2001Friedman , 2002;;Chen & Guestrin 2016).It uses a more regularized model formalization to control overfitting and thus give better performance (Chen & Guestrin 2016).
We split the sample data into training and test sets with a ratio of 7:3.For this purpose, we eliminate stars with missing labels (i.e., input labels) and discard both the top and bottom 0.5-th percentile of data points for each label as the XGBoost algorithm is sensitive to outliers.For applying the XGBoost regression algorithm, we emprically set the optional hyperparameters, including the number of regression trees, the maximum depth, the learning rate, and the minimum child weight.To obtain the best set of values for these hyperparameters, we perform a 5-fold cross-validated grid-search using the scikit-learn GridSearchCV routine (Pedregosa et al. 2011).We tune these to best fit our data without overor underfitting by using the calculated coefficient of determination of the predictions, i.e.R 2 score, as evaluation metrics.The R 2 score is the proportion of the variance in the dependent variable that is predicted from the independent variable.It generally ranges from 0 to 1, which indicates the level of variation in the given data set or indicates the accuracy of the prediction on aver-age.The estimated R 2 over n samples is defined as where y i is the observed value of the i-th sample, ŷi is the corresponding predicted value, and ȳ is the mean of the observed values.After training the algorithms to predict observed overall noise and intrinsic stellar noise in the training sets, we apply these trained algorithms to the testing set and obtain a list of corresponding predicted noise metrics.

examination of model performance with test set
Figure 4 illustrates the comparison between the observed and predicted rrms CDPP for the test set on 1.5hr, 6.0-hr, and 15.0-hr transit scale, individually.The mean bias (µ) with corresponding scatter (σ) are calculated and given in corresponding panels.The scatter of prediction for overall noise is 0.078 on 6.0-hr timescale, equivalently to 17 ppm, indicating that S -index, T eff , log g, [Fe/H], and G could be successfully used to predict the observed noise to such a level.Stars with noise on 1.5-hr, 6.0-hr, and 15.0-hr timescales have R 2 scores of 0.93, 0.90, and 0.85, respectively, which means a smaller difference and higher precision of the prediction for shorter integration duration.
Figure 5 illustrates the comparison between the observed and predicted intrinsic rrmsCDPP on 1.5-hr, 6.0hr, and 15.0-hr transit scale, individually.The comparison for the intrinsic noise returns a scatter of 0.131 on 6.0-hr timescale, equivalently to 19 ppm, with the separated scatter for the giants (log g≤4) and dwarfs (log g>4) being 16 and 21 ppm, respectively.Stars with intrinsic noise on 1.5-hr, 6.0-hr, and 15.0-hr timescales have R 2 scores of 0.85, 0.84, and 0.81, respectively.Compared to the case of overall rrmsCDPP prediction (Figure 4), the scatter in the intrinsic noise prediction is slightly increased (19 ppm versus 17 ppm), and the R2 scores are slightly decreased.We believe this is mainly due to uncertainties induced in the process of extracting intrinsic stellar noise from the overall rrmsCDPP.
Figure 6 shows the residuals between intrinsic and predicted rrms CDPP on 6.0-hr timescale as a function of the values of the label.According to label importance built in the XGBoost algorithm, we arrange the panels in descending order based on the contribution of each label to the intrinsic noise.The intrinsic noise is mostly related to log g, as the mean noise level decreases from 107 ppm for giants to 46 ppm for dwarfs in our sample.However, as shown in panel (a), the residuals of the XGBoost prediction are significantly small for giants.As our major object is dwarf star, we only display the residuals for dwarfs with log g > 4 dex in the other panels (be).The Figure illustrates that our model has achieved a good prediction for the intrinsic noise of dwarfs without significant bias to stellar labels.
In order to quantify how our results are affected by measurement errors in the input labels, we perform Monte-Carlo experiments for the test set.For each star in the test set, we randomly draw 500 sets of labels from Gaussian distributions, centered on their measured values with a dispersion equalling their measurement errors.We re-predict the intrinsic rrmsCDPP for the 6.0hr case by using the optimal XGBoost model trained above.The scatter of the 500 rrmsCDPP predictions is then calculated for each star.The median value of the scatters for the whole test set is 15 ppm (0.106 dex), and is 12 ppm for giants (log g≤4), 17 ppm for dwarfs (log g>4).
Given the total intrinsic noise is ∼ 19 ppm for dwarfs, while the intrinsic noise arised from uncertainties in the stellar labels is ∼ 15 ppm, the root of their squared difference suggest an extra component of ∼ 12 ppm, which is likely contributed by uncertainties in the intrinsic rrm-sCDPP estimates.This also means that the underlying relation between the rrmsCDPP noise and the stellar labels are rather tight, as any intrinsic scatter, if exists, should be smaller than 12 ppm.
To quantify possible statistic fluctuation uncertainty in the XGBoost modeling process, we also implement a bootstrap experiment.We re-sample the training set randomly from the original training sample 50 times.We then obtain 50 XGBoost models from these training sets, and derive their rrmsCDPP predictions for the test set.The scatters among the 50 rrmsCDPP predictions for individual in the test set has a median value of 0.024 dex, indicating that statistic uncertainty arising from the XGBoost modeling is negligible.This is not surprising as the size of the training sample is sufficiently largefor our modelling purpose.

generating stellar noise for the LAMOST sample
We have so far achieved a series of training models that enable predictions for photometric noise on different timescales of stars that have parameters falling within the trained parameter space.For the generalization set, we start with the catalog of the A, F, G, and K type star from the LAMOST DR7 (v1.1) lowresolution survey which includes 6,199,917 spectra 2 .The S -indexes are then determined for 2,278,792 spectra with S/Ns greater than 50 in the catalog.To obtain the unique source, we use the CDS X-match service in TOPCAT (Taylor 2005) to consider the epoch of Gaia DR3 stars.Targets are identified with R.A. and Dec. coordinates within 2.0 arcsec, 1,763,868 stars

Note-
The generated overall noise listed in columns 7-9 as well as intrinsic noise listed in columns 10-12 are given in their logarithmic form.(This table is available in its entirety in the machine-readable form.)are cross-matched.To maintain consistency with the source of the atmospheric parameters, we cross-match these stars with the catalog of stellar parameters of 7 million stars from the LAMOST DR7 spectra based on the data-driven Payne (see Section 2 for more details).
It is of crucial importance to ensure that the generalization set parameter space overlaps with our training set as much as possible.Hence, we further select a generalization set with the values of their labels (i.e.T eff , log g, [Fe/H], G, and S -index) fall within the maximum and minimum values of the corresponding labels in the training set, 1,358,275 stars eventually remain.Table 2 lists the generalization set along with labels as well as their predicted photometric noises (including overall noise and intrinsic noise) on different timescales for 1,358,275 targets in the LAMOST field.
Figure 7 shows distributions of generated 6-hr intrinsic rrms CDPP in T eff -log g space, together with the model tracks from PARSEC (Nguyen et al. 2022).It is clear that the noise levels vary with stellar evolutionary phases.Regarding the stars with relatively low noise levels, like log rrmsCDPP < 1.6, which will be the special interest targets in the transiting-planet surveys, we found that part of them have ended their main-sequence phase and entered sub-giant phase.The magnetic field wanes on sun-like stars as they evolve off the mainsequence, which leads to weak activity levels and negligible contributions to intrinsic stellar noise.Although these evolved stars are promising targets according to their low activity level, their internal structures and surface properties are unstable because of rapid evolutionary changes, leading to inhospitable to life.Moreover, the orbital distance that corresponds to the habitable zone, which is based on stellar irradiance and the host star's SED, moves outward with increasing stellar luminosity during a star's evolution (e.g.Rushby et al. 2013;Luger & Barnes 2015;Ramirez & Kaltenegger 2014, 2016), which decreases the probability that transiting by terrestrial planets in front of these evolved stars in light curves (Ramirez & Kaltenegger 2016).
For the stars on main-sequence, we found the noise levels tend to drop off over time, which is, in a way, a good manifestation of the evolution of stellar activity and has been demonstrated in previous studies (e.g., Chen et al. 2021;Ye et al. 2024).Stellar magnetic activity is related to stellar rotation but also causes the star to lose angular momentum over time via braking from a magnetic wind, which offers the promise that one might be able to trace stellar ages with activity levels (See Brun & Browning 2017, for a review).In general, those prioritizing targets with low intrinsic stellar noise for transiting planet surveys would have evolved for a long lifetime.In addition, we see that cool stars have higher noise levels than hot stars, which is commonly interpreted as an indication of the high activity levels of cool stars.On the right area of the panel, there is a small number of relatively high-level-noise stars which may be misplaced due to the large error of log g and they are supposed to be located at the lower-luminosity region.
The bulk of less-luminous red giants are generally the noisiest.In this situation, there should be little probability that transiting by terrestrial planets in front of these evolved stars can be detected with achievable precision of current known missions.In addition, the associated depth in the light curve varies as the squared ratio between planet and star radii, (R p /R ⋆ ) 2 (Heller 2019), the giant stars with expanding out layers would lead to indiscernible depth in light curves arising from potential Earth-sized planets.On the contrary, the noise of stars with log g < 2 dex are relatively lower.Since the noise metrics in the Kepler light curves are determined by decomposing the data in the time-frequency domain (Jenkins 2002), the low-frequency pulsation in a longer period regime produced by these luminous red giants are usually beyond the High-Pass Filter within several-hour time scale, which leads to the underestimated noise level for these targets.

DISCUSSION
Our results have shown that there exists a tight correlation between the stellar intrinsic photometric noise and the chromospheric activity (S-index), and this relation can be used to predict the intrinsic photometric noise from spectroscopic S-index and stellar atmospheric parameters, which is valuable for exoplanet detection.An important factor that affects the robustness of our method is the S-index measurement.While our analysis above has shown that the internal error of the LAMOST S-index measurement for our sample  stars is small (< 0.02), an external comparison with literature high-resolution spectra is insightful for revealing any systematic uncertainties in our results.

uncertainties due to stellar activity measurements
We collect 186 stars that have S -index from both the LAMSOT low-resolution spectra and literature high-resolution spectra (Duncan et al. (1991, 73 stars), Wright et al. (2004, 17 stars), Isaacson & Fischer (2010, 74 stars), Jenkins et al. (2011, 11 stars), Gomes da Silva et al. (2021, 11 stars)).All the S -index values in the latter have been calibrated to the MWO measurement.The effective temperatures of the collected stars span from 3929 K to 7392 K.For our purpose, we limit the analysis to stars with 5400 < T eff < 6500 K and log g > 3, which leaves 124 stars.The left panel of Figure 8 shows a comparison of S -index between LAM-OST and literature values, while the middle panel shows the comparison particularly focused on stars with lower activity.
The LAMOST measurements exhibit a good consistency with literature values, but there exists a systematic trend that deviates from the 1:1 line.A linear fit to the trend yields S MWO = 2.257(±0.132)•SLAMOST −0.270(±0.032).(5) Such a systematic trend is a consequence of the different spectral resolution for the S-index measurements.Beyond the trend, there also exists a star-to-star scatter concerning the linear fit, which is 0.068 for the SMWO are not available.In many cases, the error bars are smaller than the symbol size.Middle: Same as the left panel but for stars with SMWO < 0.4.Right: Comparison of S -index measured from high-resolution spectra for 56 dwarf stars with that measured from the same spectra but degraded to low resolution power.Each of the high-resolution spectra are degraded to 11 low-resolution spectra, for which the resolution power is adopted to be the LAMOST resolution but added by a random offset within ±20 per cent (see text).The dashed lines in all the panels delineate the 1:1 line.The solid black lines represent the best linear fit with slope (a), intercept (b), and standard deviations of the fitting residuals marked in the plot.
overall sample, and 0.047 for stars with S MWO < 0.4.We believe such a scatter is mainly caused by intrinsic temporal variations of the stellar activity levels, as the LAMOST and literature spectra were taken at different epochs spreading decades.
To validate this speculation, we made an independent test by directly degrading the high-resolution spectra to the LAMOST resolution.We adopt the MELCHIORS database (R = 85, 000; Royer et al. 2024), from which we selected 56 spectra of F/G/K type dwarf stars with 5400 < T eff < 6500 K, based on stellar parameters derived from Gaia BP/RP spectra (De Angeli et al. 2023;Carrasco et al. 2021) with the DD-Payne (Xiang et al. in prep.).For each high-resolution spectrum, we degrade it to 11 low-resolution spectra that have different resolution powers with random values within ±20% of the mean resolution of LAMOST, mimicking the fiber-tofiber variation of the LAMOST spectral resolution (e.g.Xiang et al. 2015).
The right panel of Figure 8 presents a comparison of Sindex measured from the MELCHIORS high-resolution spectra and that from spectra degraded to LAMOST resolution.It shows a similar systematic trend to the left and middle panels, validating that the systematic trend is mainly a consequence of different resolutions.However, the star-to-star scatter is only 0.017, which is consistent with measurement errors of the S-index (Sect.2) but much smaller than the middle panel, verifying the star-to-star scatter of S-index between LAM-OST and literature shown in the left and middle panels are likely due to intrinsic temporal variations of stellar activity.
For solar-type stars, an extra uncertainty in S-index of ∼ 0.047 due to temporal variations of stellar activity will cause a median uncertainty of 15 ppm in the rrmsCDPP for the 6.0-hr case.Considering the Kepler photometric observations and the LAMOST spectroscopic observations used in this study were implemented at similar epochs, with a difference of ≲ 5 years in typical, we expect the effect due to temporal variation of stellar activity is insignificant for our rrmsCDPP prediction, except for young and active stars with rapid activity variation.This, in turn, also implies that in order to have a good estimate of the stellar intrinsic noise with our method for future planet detection surveys, it is necessary to make sure the stellar activity is measured from spectra taken at a similar epoch.

stellar activity -intrinsic photometric noise relation in MWO scale
Irrespective of the temporal variation of stellar activity, we have repeated Figs. 2 and 3 but for S-index calibrated to the MWO scale.Figure 9 shows that the positive relations between S -index on the MWO scale and intrinsic rrmsCDPP are tenable on each integration duration panel.As illustrated by Figure 10, typical values of the intrinsic rrmsCDPP are 20-30 ppm for solar-type stars with S -index lower than 0.22.Beyond this inflection point, the positive relations become significant.These results are accordant with that using S -index derived from the LAMOST spectra (Figure 3).(Egeland et al. 2017) and intrinsic solar noise value from (Gilliland et al. 2011).
planets using transit methods.In this work, we have investigated the relation between the stellar intrinsic photometric noise, as quantified by the Kepler rrmsCDPP, and the stellar chromospheric activity S -index derived from the LAMOST survey spectra.Our results revealed that, for solar-type stars, there exists a clear positive correlation between S-index and rrmsCDPP.Inactive stars with S -index lower than ∼ 0.22 mainly possess low intrinsic noise, with rrmsCDPP values of 20-30 ppm, while the intrinsic noise increases dramatically for more active stars with S-index higher than 0.22.The correlation also shows a clear dependence on the photometric integration duration, as it becomes stronger for longer integration duration.
We then have built an empirical relation between the intrinsic noise and the stellar labels, including the Sindex, T eff , log g, [Fe/H], and apparent magnitude, using the XGBoost regression algorithm.Internal and external examinations suggest the relation is robust, and our approach has achieved a typical precision of 20 ppm for inferring the intrinsic noise from the S -index and other stellar labels on a 6-hour integration duration.We have applied this empirical relation to the full LAMOST spectra database, and obtained the intrinsic noise predictions for 1,358,275 stars.The resultant catalog is publicly available and expected to be valuable for optimizing target selection for future exoplanet-hunting space missions, such as the Earth 2.0 mission.

Figure 1 .
Figure 1.Observed rrms CDPP on 6.5-hr timescale as a function of stellar apparent magnitude in the Gaia G bandpass for Kepler stars.The red square symbols at the lower envelope represent the bottom 0.5-th percentile of points in bins of 0.2 magnitudes wide.The red curve represents a 4th-order polynomial fit to the points shown by red squares, which serves as a lower border of the rrms CDPP as a function of magnitude.

Figure 2 .
Figure 2. S -index versus intrinsic rrms CDPP on different integration duration for Sun-like stars with magnitudes in the range of 11.5 < Kp < 12.5.The 6.5-hr rrms CDPPs are estimated from the accessible 6-hr rrms CDPPs that are divided by (13/12) 0.5 = 1.041.The red dot symbols connected with lines in each panel represent the median values of intrinsic rrms CDPP in 6 S -index bins.The vertical line segments indicate standard errors of intrinsic rrms CDPP within the bin.The orange square with error bars in the bottom-left panel represents the Sun.

Figure 3 .
Figure 3.The median values of intrinsic rrms CDPP as a function of average S -index in 6 bins (see Section 3.2 for more details) for sun-like stars with magnitudes in the range of 11.5 < Kp < 12.5.The error bars indicate standard errors of intrinsic rrms CDPP within the bins.

Figure 4 .
Figure 4. Comparison of observed with the predicted rrms CDPP on 1.5-hr (left panels), 6.0-hr (middle panels), and 15.0-hr (right panels) transit scale, with the lower panels represent the respective residuals between observed with the predicted rrms CDPPs.The dashed lines represent perfect agreement.The mean bias (µ) with corresponding scatter (σ) is shown in the upper panels.

Figure 5 .
Figure 5.The same as Figure 4 but for intrinsic rrms CDPP that removes Poisson noise as well as instrumental noise from observed rrms CDPP using Equation 1 and Equation 2.

Figure 6 .
Figure 6.Residuals between intrinsic and predicted rrm-sCDPP on 6.0-hr as functions of individual labels for the whole (dwarfs and giants) test set (panel a) and for the dwarfs (panel b, c, d, and e), color-coded by the stellar number density.The dashed lines represent perfect agreement.Red dots and vertical error bars are the medians and standard deviations of the residuals within 15 bins for each label.

Figure 7 .
Figure 7. Distribution of 6-hr intrinsic rrms CDPP values for the generalization set in the T eff -log g plane, binned by 20 K by 0.02 dex and color-coded by the median value of rrms CDPP on a logarithmic scale.The black lines show evolutionary tracks from PARSEC v2.0 (Nguyen et al. 2022), with the mass and metallicity shown in the panel.The two black dots along each track indicate the start point of mainsequence as well as sub-giant phases.

Figure 8 .
Figure8.Left: Calibration of S -index from the LAMOST to the MWO using 124 stars within 5400 < T eff < 6500 K and SMWO values from literatures shown in the plot.Error bars are only shown for the LAMOST measurements, as errors for literature SMWO are not available.In many cases, the error bars are smaller than the symbol size.Middle: Same as the left panel but for stars with SMWO < 0.4.Right: Comparison of S -index measured from high-resolution spectra for 56 dwarf stars with that measured from the same spectra but degraded to low resolution power.Each of the high-resolution spectra are degraded to 11 low-resolution spectra, for which the resolution power is adopted to be the LAMOST resolution but added by a random offset within ±20 per cent (see text).The dashed lines in all the panels delineate the 1:1 line.The solid black lines represent the best linear fit with slope (a), intercept (b), and standard deviations of the fitting residuals marked in the plot.
6. CONCLUSIONStellar intrinsic photometric noise arising from magnetic activity is a main interference in detecting exo-

Figure 9 .
Figure9.The same as Figure.2but using calibrated Sindex on the MWO.The orange square with error bars represents the Sun with S-index value on the MWO during activity cycles 15-24 from(Egeland et al. 2017) and intrinsic solar noise value from(Gilliland et al. 2011).

Figure 10 .
Figure 10.The same as Figs. 3 from the main text but using calibrated S -index on the MWO.